samtools / htslib

C library for high-throughput sequencing data formats
Other
799 stars 445 forks source link

How to get reads depth by sam.h #1526

Closed leon945945 closed 1 year ago

leon945945 commented 1 year ago

Hi htslib team, I want to write a simple TE insertion/deletion software with sam.h, cause the runtime of pysam is relative longer, but I didn’t find the function that could get the reads depth of one site or region in sam.h, maybe use ‘sam_itr_querys’ then iterate all the reads mapped to one certain site/region and count reads number? Could you please tell me the function that could get reads depth in sam.h, or some ideas to get it, thank you very much.

jkbonfield commented 1 year ago

There's no single function that simply returns the coverage for a single site or a region.

The iterators just get one BAM record at a time. You can of course use the iterator for a region and accumulate the alignment data one record at a time in order to build up a depth array. This is now samtools depth works: https://github.com/samtools/samtools/blob/develop/bam2depth.c#L592-L678 That looks a bit complex, but it has a lot of options and the basic algorithm doesn't have to be so complex.

However it's probably easier for you to use the mpileup interface instead. Mpileup rotates the data and instead returns information about one column (one reference position) at a time. That inherently also includes its depth, along with pointers to all the BAM records overlapping that region and their appropriate bases / qualities. It's the core of the samtools mpileup command, but coverage and bedcov also uses the same interface.

This isn't so obvious in usage, samtools bedcov is probably the easiest to understand. See https://github.com/samtools/samtools/blob/develop/bedcov.c#L234-L248

leon945945 commented 1 year ago

Thanks for your detailed reply, it is very useful for me. The code comment of mpileup interface is relatively less, could you show me an example of mpileup, I will try to figure it out. Thanks!

jkbonfield commented 1 year ago

See the bedcov example above. There is a bit of initialisation work to do too. The bit I highlighted was the main loop.

leon945945 commented 1 year ago

Thank you, I will read these codes carefully, and try to take use of them.