sstadick / perbase

Per-base per-nucleotide depth analysis
MIT License
115 stars 13 forks source link

accessing qpos() ? #60

Open brentp opened 1 year ago

brentp commented 1 year ago

Hi Seth, I'm experimenting with using perbase here

I'd like to be able to filter out bases that are within $x from either end of the read. It seems that's not possible without re-implementing PileupPosition.update, is that true?
It would be nice for this use-case if ReadFilter accepted an Alignment rather than a Record since we can always get a record from an alignment and with an Alignment, I could check .qpos() .

Do you see another way to do this given what's exposed? Thanks in advance.

sstadick commented 1 year ago

First off, that is a fantastic name for a tool!

Passing an alignment into ReadFilter would make more sense, and be more flexible as a trait API anyways, and I think gets you what you want. If the record creation is not zero cost, both the Alignment and Record could be passed into the ReadFilter. At that point the name of ReadFilter should arguably be something a bit different since it's actually filtering out positions... and counting the "fails" feels a bit weird?

Long way around to saying - yes, fixing how ReadFilter works seems like the path of least resistance, I'm not sure I can think of another way to do it.

totally open to PRs on this. I've been deep in the land of ddPCR recently and haven't had as much time for NGS related activities :(

Talking about this touches on two other open issues, one of which you've already found:

Some day I'll circle back on those two!

brentp commented 1 year ago

Thanks, I'll have a look again. Might just copy a lot of code for simplicity into pbr. Yeah. I'd prefer to use noodles, but obviously there's more overhead to getting started there even with your prototype. It would be nice to filter reads before they are in the pileup, especially when I'm calling a lua function on the reads, that gets called many times instead of once for each read. And maybe a ReadFilter and more flexible BaseFilter is another way to go.