Open ybukhman opened 1 year ago
Looks like this may not be the only problematic duplication and Mark C. may be able to provide a list. If so, should we fix all of them before submitting a revised assembly to NCBI? Can this also be done for the alternate haplotype?
POU5F1 has two copies in the primary (paternal) haplotype but only one in the maternal. It looks like the segment containing the second paternal copy is a spurious duplication, as it has near-zero coverage. We have looked into it with Mark Chaisson and Michael Hiller.
A screen shot of the NCBI browser shows an apparent tandem segmental duplication in the primary assembly. The two markers highlight Pou5f1 and LOC117724197, annotated as "POU domain, class 5, transcription factor 1-like". Observe a similar set of genes to the right of each:
Human genome browser with Nile rat alignment chains suggests two copies in the primary assembly, mArvNil1.pat.X scaffold CM022270, light blue. There is only one copy in the alternate haplotype, mArvNil1.mat scaffold JAAOME010000022, green: (image by Michael Hiller)
There second copy of Pou5f1, on the far right, has near-zero read coverage: (image by Mark Chaisson)