marbl / HG002-issues

HG002 human reference genome issue tracking and polishing
10 stars 0 forks source link

Possible insertion of hundreds of extra bases in a long C/T rich stretch at chr11_MATERNAL:31,305,332-31,305,733 #664

Open nhansen opened 8 months ago

nhansen commented 8 months ago

Have you confirmed that this issue hasn't already been reported?

Issue location in assembly (use format chromosome:start-end, e.g., chr13_MATERNAL:3740148-9625296)

chr11_MATERNAL:31305332-31305733

Description of the issue

A stretch of roughly 350 bases of CT-microsatellite is being read inconsistently by different long-read platforms. In addition, large numbers of reads having CT-microsatellites are possibly being misaligned to the region (note the very high coverage which is caused by clipped reads supporting only the CT region, but not the flanks.

image