Closed jelber2 closed 4 days ago
Hi @jelber2!
We are aware of this issue. However, that's not the only problem. We have seen that HERRO overcorrects not just in centromeres or other hard regions. It occasionally will even "correct" reads and move them across haplotypes. This could have disastrous consequences on phasing and detangling.
@kokyriakidis Yes, I have seen evidence of HERRO causing reads to switch phases/haplotypes. Qualitatively, it does not seem extreme though, but yes, it happens. Could you expand on what the disastrous consequences could be on phasing and detangling? For example, might one see false SNVs, indels, or structural variants show up in the detangled bubbles- such that a phased-block in one haplotype has these false variants?
See page 45 from https://github.com/jelber2/hapmers/blob/main/hapmers-presen.pdf regarding evidence of phase switching from Herro-corrected reads. That is an older presentation, and I have learned more things from those data sets and summarized them in a manuscript if you are interested and maybe would not mind being cited as a personal communication.
@jelber2 There are several problems:
In conclusion, and based on our extensive analysis, HERRO definitelly overcorrects on hard regions.
It is totally fine to be cited as a personal communication :)
Thank you very much!
Since you guys have run
Herro
or perhapsdorado correct
, I thought that perhaps this github issue might be of interest https://github.com/nanoporetech/dorado/issues/851 that during the minimap2 all-versus-all overlapping of raw Nanopore reads, there are likely reads with repetitive minimizers "tossed out" and not included in the downstream correction steps.Feel free to close or leave as a discussion