lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
476 stars 131 forks source link

sam2tsv: limit output only the intersecting positions of reads #205

Closed ramesh8v closed 1 year ago

ramesh8v commented 1 year ago

Hello,

Thanks for writing this tool, this is a lifesaver. This is not a bug but a question.

I'm trying to use sam2tsv to output intersecting bases of all the reads for the given reference genome position in the --regions file. My command looks like this:

java -jar ~/jvarkit/dist/sam2tsv.jar -R sequence.fasta --regions snvs.vcf.gz aligned.bam

This is outputting all the bases of intersecting reads, I am wondering how to limit the output only to the intersecting positions. I could filter this by writing a Python script, since sam2tsv is generating enormous files and I have 100s of samples, I am wondering is there a way out?

lindenb commented 1 year ago

Hi, convert the output to bed with awk and use "bedtools intersect" to get the desired outpout

ramesh8v commented 1 year ago

Thanks for the quick response. Using bedtools intersect is a great idea!