mjafin / splitNreads

splitNreads
MIT License
5 stars 2 forks source link

[Bug] Splitting reads into three or more parts #2

Closed landesfeind closed 9 years ago

landesfeind commented 9 years ago

Dear @mjafin, thank you for the tool, which is very helpful. However, I got stuck while splitting reads containing two or more N cigars:

18M2443N77M2449N5M TAGGGGCTCCCGCTGCAGCTCTTTCACTTCCACAAGACCAGAAAGAGTCAGGGCTGAACACAGCTTAGATGCTGTCTTCACTTTGCTATTGTTATCTACA
--- becomes: ---
18M4974H           TAGGGGCTCCCGCTGCAG
2461H77M2454H                        CTCTTTCACTTCCACAAGACCAGAAAGAGTCAGGGCTGAACACAGCTTAGATGCTGTCTTCACTTTGCTATTGTTATCTACA
4987H5M                                                                                                           CTACA

Obviously, the second sequence is too long which also holds for the quality string. This happens as a result of your code in lines 80 and 82 extracting the sequence/quality substring. In both, change start_pos:start_pos+offset to start_pos:offset because offset already contains the absolute end position.

This is not a problem for splits of reads containing one N but clearly for reads containing more.

Thank you for your work, Manuel

mjafin commented 9 years ago

Thanks for the bug report @landesfeind and fix proposal! I have to be honest I haven't used the tool myself too much as it has some underlying problems such as apparent too short resulting reads and the variant callers should be able to handle splicing intrinsically (like VarDict does https://github.com/AstraZeneca-NGS/VarDict).

I can implement the fixes as you suggest but would rely on you testing them. I also happily welcome any pull requests.

Thanks, Miika

mjafin commented 9 years ago

@landesfeind can you check if https://github.com/mjafin/splitNreads/commit/a84339a1a72e0c98da6816b56dd24fa7f02ce22e fixes your problem?

Please reopen if there's anything else. Thanks for the fix!