Open nhansen opened 10 months ago
I just came across this when looking at DV calls in both HiFi and ONT as well, and looks like it is annotated as a segdup between HSats in the browser. Some HG2 and HG4 HiFi reads align across it so should be possible to correct based on HiFi sequence as well as ONT
One thing I just noticed when looking at NateD's stratifications is that the 11kb region chr10_MATERNAL 43232301 43243056 is a pure C homopolymer! This made chr10 a huge outlier for long C homopolymers :)
Wow--tagging @skoren so he can possibly figure out where all those C's came from! Just to be picky, though, it's not pure C's for the whole 11kb. There are actually non-C bases at about seven spots, giving eight very long mononucleotide runs!
As you point out, Justin, it should be fairly easy to re-call consensus for this stretch to create a patch.
ah, I forgot that we merged nearby perfect homopolymers to get this region, so that makes sense that there could be small (<10bp) interruptions between 21+bp perfect homopolymers
Have you confirmed that this issue hasn't already been reported?
Issue location in assembly (use format chromosome:start-end, e.g., chr13_MATERNAL:3740148-9625296)
chr10_MATERNAL:43,216,000-43,245,000
Description of the issue
Sniffles calls on v0.9 picked up a large deletion in this region, and long read data bear it out. Here's an IGV screenshot: