Closed ad3002 closed 1 year ago
This seems the same as #97 and #83 and is caused by a duplicate read in the MBG input. We haven't gotten a test set to reproduce locally. Could you check your hifi-corrected.fasta.gz
for duplicate reads as well as the original input HiFi data? Are you able to share your input HiFi data to reproduce the error locally, if so, see this page for info on how to send us the data: https://canu.readthedocs.io/en/latest/faq.html#how-can-i-send-data-to-you
Yes, there were several duplications in names/sequences. And I found them in a BAM file provided by the sequencing facility too. I removed them and the verkko finished without any errors. I think an easy solution would be to add a warning about potential duplicates to the README. A slightly more difficult solution would be to implement a sanity check for duplicates in the input reads.
Note that verkko only keeps the read name up to the first space so if there is a difference after the space in the read names, they will still look like duplicates.
I just pushed a fix to check for this in deaad6e. This should be in the next release. I am also going to close this issue along with #97 and #83 but please re-open if you see cases where there were no duplicates in the input that hit this error. Note
I got some assertion error with verkko from conda (default parameters running with HiFi + Oxford Nanopore):
Here https://github.com/maickrau/MBG/blob/4c5f27cc6c17369706d5a697687d715deb8b657d/src/ConsensusMaker.h#L65