quinlan-lab / STRling

Detect novel (and reference) STR expansions from short-read data
MIT License
61 stars 9 forks source link

zero length reads #106

Closed christopher-schroeder closed 1 year ago

christopher-schroeder commented 2 years ago

Allow reads in bam that have been trimmed to zero length.

hdashnow commented 2 years ago

Thanks, @christopher-schroeder!

I wonder if it would be simpler to exclude these reads completely rather than allowing them through? @brentp what do you think?

christopher-schroeder commented 2 years ago

Do you mean removing them from the bam? Thats not that easy, because to get a valid bam, you would have to remove or modify the mate. And I have about 300 whole genomes already processed in bam. strling needs indexed data, so you cannot stream. That would mean writing a lot of terrabytes ohne for a couple of removed reads. Also I think a tool should be able to process input files as long as they are valid by format specification.

Or do you mean ignoring them in strling? I am not so deep into the source code and don't know what happens if you get see read, where the mate has been ignored previously. But if this not a problem, then ignoring the read would be totally fine!

hdashnow commented 1 year ago

Sorry, came to check on another PR and realized we left this one hanging! I'm thinking to remove the assert statement, and instead skipping over these reads as they are not informative.

brentp commented 1 year ago

yes, I think we can skip them, but we must make sure that the mate is added/removed from the cache or the memory might grow quickly.

hdashnow commented 1 year ago

I'm going to allow 0-len alignments, but report on them in debug mode. I don't have a good data set to test this on, but if this comes up again, at least we can count the occurrence in the debug output.