lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
478 stars 132 forks source link

sam2tsv question: any way to only parse reads that overlap specific coordinates in reference? #187

Closed hengjwj closed 3 years ago

hengjwj commented 3 years ago

Hi Pierre, thanks for writing sam2tsv. It's been a convenient yet powerful tool. I'm writing to ask - is there any way to restrict what sam2tsv does to only specific coordinates of interest? For example, if I'm only interested in 100 coordinates in a genome, is there a way to speed up sam2tsv so that it doesn't convert all reads in the BAM file since I'm only interested in specific coordinates?

I think one roundabout way could be to pre-filter the BAM files for reads which cover the positions of interest using samtools or other equivalent software. However, assuming read length of 100bp and one coordinate of interest per read, is there a way to avoid processing the other 99 positions for all reads containing coordinates of interest?

E.g. input: BAM file and file containing list of 100 coordinates output: tsv of 100 coordinates

lindenb commented 3 years ago

samtools view -M -L your.bed in.bam | java -jar etc...

hengjwj commented 3 years ago

Hi Pierre, thanks for pointing me in the right direction!