Closed hannespetur closed 7 years ago
Hi Hannes,
You're right, VariantBam does not use the index. This was by design, since for linked-regions where we want to grab linked reads from a pair, they could be aligned anywhere in the genome. I figured that if one wanted to restrict VariantBam to just run in just some region, a pipe out from Samtools is the more parsimonious way to go, rather than have to concepts of "region" (e.g. where to even look, and then where a rule should be applied).
The region
flag in VariantBam then is to specify a region that a rule should be applied to. In your example, since no rules are supplied, the region
will just the default rule of accepting all reads that intersect that area.
Hope this helps, Jeremiah
Thank you Jeremiah for your response. It cleared things up for me.
I really like the idea behind linked regions, but I would want to use it without restricting myself to a small region first. And you are right, the read mate could be aligned anywhere on the genome. However, we know exactly where it is (the RNEXT and PNEXT fields in the SAM specs). I think it would be really cool to extend VariantBam so it would fetch distant mates using the index and significantly speeding up the process.
I will close this issue because you have already answered my question, but I'd love to hear your thoughts on this.
Best regards, Hannes
Hi Hannes,
I see what you're saying, and this is a good idea. I've thought of this before, but haven't gotten around to implementing. I'll put this on my to-do list and will comment here when I get a solution.
Best, Jeremiah
Hello. Thank you for creating this nice tool.
I was testing your tool using on a Platinum Genomes CRAM file to see how it performs on CRAM files, but reading seems to be incredibly slow.
where $GENOME points to a copy of the 1000G GRCh38 reference genome.
Do you know why this is? I suspect VariantBam is reading through the entire CRAM file instead of using the index.
Best regards, Hannes