pezmaster31 / bamtools

C++ API & command-line toolkit for working with BAM data
MIT License
412 stars 153 forks source link

Quickly jump to a given alignment / equivalent of ftell and fseek for BamReader? #205

Closed morispi closed 3 years ago

morispi commented 3 years ago

Hello,

I'm relying on bamtools to build an index structure for 10x genomics barcodes. The basic idea is to be able to quickly retrieve any alignment in the BAM file that contains a given barcode.

What I'm doing now is reading each alignment (say, al) from the BAM file iteratively with a BamReader, and for each barcode, keeping a list which I populate with the [al.RefID, al.Position, al.GetEndPosition()] info of the alignment, to be able to use the SetRegion functions later on.

To query a given barcode, I then retrieve its associated list and perform a SetRegion(RefID, Position, RefID, EndPosition) with the BamReader, to fall into the region of interest. However, I end up having to do a lot of calls to GetNextAlignment() until I reach the actual Position of interest. Indeed, if I understand correctly, the function does not allow me to fall exactly on the first alignment that begins at Position on RefID, but rather leads me to a region of the BAM file that starts with the first alignment overlapping the Position of interest.

While this solution does work, it ends up being very time consuming because of the numerous calls to GetNextAlignment() I need to perform to get to the Position / alignment of interest.

I'm thus questioning whether or not there's a way to perform something like a ftell() for each alignment I'm reading, that would tell me the exact offset of the alignment, record this in my index, and then directly perform something like a fseek() so I can directly end up on the alignment of interest? I tried messing around with the SetRegion() parameters, but ended up having to skip a lot of alignments no matter what.

Please tell me if I haven't been clear enough or if you need anymore information.

Thank you very much.

Best, Pierre

morispi commented 3 years ago

Ended up finding the solution by dirty accessing the BamReaderPrivate Tell and Seek functions. Closing the issue.