Closed christopher-schroeder closed 1 year ago
Thanks, @christopher-schroeder!
I wonder if it would be simpler to exclude these reads completely rather than allowing them through? @brentp what do you think?
Do you mean removing them from the bam? Thats not that easy, because to get a valid bam, you would have to remove or modify the mate. And I have about 300 whole genomes already processed in bam. strling needs indexed data, so you cannot stream. That would mean writing a lot of terrabytes ohne for a couple of removed reads. Also I think a tool should be able to process input files as long as they are valid by format specification.
Or do you mean ignoring them in strling? I am not so deep into the source code and don't know what happens if you get see read, where the mate has been ignored previously. But if this not a problem, then ignoring the read would be totally fine!
Sorry, came to check on another PR and realized we left this one hanging! I'm thinking to remove the assert statement, and instead skipping over these reads as they are not informative.
yes, I think we can skip them, but we must make sure that the mate is added/removed from the cache or the memory might grow quickly.
I'm going to allow 0-len alignments, but report on them in debug mode. I don't have a good data set to test this on, but if this comes up again, at least we can count the occurrence in the debug output.
Allow reads in bam that have been trimmed to zero length.