Closed nhansen closed 1 year ago
At least part of the region is annotated as alpha satellite sequence, so we can expect difficulties in aligning reads here. There is a little hole in HiFi coverage (highlighted by the vertical black line in the screenshot), but if we look at the HiFi reads aligned to only MATERNAL+Y+ENV, there are HiFi reads that would align through the hole. However, those HiFi reads, that might have been used for assembling the PATERNAL haplotype, do not fully support the corresponding PATERNAL region (next screenshot for more detail).
For example, I can see the 342bp deletion on the right in both the HiFi reads covering the hole and the PATERNAL haplotype, but I don't see in the same HiFi reads confirmation for the other deletions that the PATERNAL has.
Strangely, the ONT reads say totally different things. Moreover, if we consider ONT reads aligned against the diploid assembly, there are two little TT insertions on the right (1st screenshot) that are present in the PATERNAL haplotype, not in the MATERNAL. However looking at the ONT reads aligned against MATERNAL+Y+ENV, almost all reads present those two little insertions:
It seems that both MATERNAL and PATERNAL might need cleaning here.
Flagger has a pretty large region flagged here and I think this is related to #266 on the other haplotype. Definitely will need to trace back the ONT resolution here to see how verkko resolved it.
VerityMap shows a solid-k-mer desert of length 5535bp at chr18_MATERNAL:18007740-18013274
meaning that there are at least two flanking solid-k-mers around these coordinates.
Having no k-mers here results in theoretical impossibility to map HiFi reads here (with the current parameters of VerityMap), and indeed we observe a coverage gap that is consistent with HiFi alignments provided by Winnowmap.
What makes things even more tricky is that VerityMap doesn't have much primary alignments, only secondary, and these are typically less reliable.
Let's assume that there is no issue in the underlying assembly, then lack of coverage with even secondary alignments by both tools at chr18_MATERNAL:18007740-18013274
can be explained by either HiFI drop-out or a solid-k-mer desert.
There is k-mer desert just upstream of length 8842bp with coords chr18_MATERNAL:17986518-17995359
. This region (albeit being longer), however, has secondary alignments by both tools. This somewhat rules out that there are no alignment due to a solid-k-mer desert.
HiFi drop-outs are usually associated with some micro-satellite enrichment and there is no evidence of such.
Winnowmap alignments of ONT mappings here are also deflated (although not to zero coverage).
Together last two points suggest that it is probably more likely that there is no HiFi coverage drop-out.
That, in turn, suggests that there might be an issue in the underlying assembly.
Additional evidence in favor of this is that there is no HiFI read that spans the desert chr18_MATERNAL:18007740-18013274
for both VerityMap and Winnowmap (including secondary alignments).
I agree with @skoren that investigating Verkko graph here would be helpful.
The v0.8 assembly did a much better job with this region. We will patch v0.7 here.
Assembly Region
chr18_MATERNAL:18002949-18018878
Assembly Version
v0.7
ont_evidence subregions
18002949-18018878 (Low)
hifi_evidence subregions
18007869-18014634 (Low)