samtools / htslib

C library for high-throughput sequencing data formats
Other
785 stars 447 forks source link

extract reads by read name #1629

Closed leon945945 closed 1 year ago

leon945945 commented 1 year ago

Hi, I want to split the bam file into two files according to reads' name, but I did not find a function to extract reads by read name in sam.h, is there such a function to do it?

If so, can you give me some advice to implement it?

Thanks.

jkbonfield commented 1 year ago

There aren't any specific functions to split data in this manner. If the read name split also corresponds to a read-group or another similar tag, then samtools split could be used.

If you're looking to write your own tool specifically based on read names then you'll need to do a while ((r = sam_read1(...)) >= 0) loop, and then use bam_get_qname() on the bam object and pass that into whatever method you wish to do filtering - eg regexec or strchr.

An example of a basic read-write loop is here: https://github.com/samtools/htslib/blob/develop/test/test_view.c#L162-L169

You'd be creating multiple output files and selecting which "out" fp to use depending on name. There are region iterator examples high up in the same file.

leon945945 commented 1 year ago

Thanks very much!!!