marschall-lab / project-male-assembly

HGSVC SIG: targeted chromsome Y assembly
MIT License
8 stars 1 forks source link

GRCh38 chrY contig #9

Closed ptrebert closed 2 years ago

ptrebert commented 2 years ago

Since many downstream analyses will be performed relative to 38, can you think about (and decide) how to handle this sequence:

chrY_KI270740v1_random (37240 bp)
pilleh commented 2 years ago

@ptrebert Yep, on it.

pilleh commented 2 years ago

@ptrebert Okay, this is an unlocalised contig entirely composed of DYZ19 125-bp repeats - this is the small heterochromatic block in the middle of euchromatin, which (as far as I've seen) always gets completely assembled without any giving any trouble. In the current version of hg38 the DYZ19 region contains 50kb of Ns, and also I believe it's not correctly assembled since a block in the center is inverted - in all our assemblies the repeats are always in the same head-to-tail orientation. So chrY_KI270740v1_random is some sort of a fix, but since it is unlocalised, I don't know how we can include it in our analysis. I want to include comparisons of this region in the paper, but I think it makes more sense to just ignore this contig. I'll add a note to the Suppl Methods so that I won't forget to mention it. I hope this makes sense.

ptrebert commented 2 years ago

Yes, makes sense. From my perspective, it's just about whether or not (coverage) statistics should include this contig or not (IMO, that type of data is only informative relative to T2T-Y). Not sure if that contig would/could play a role for others (Mark/Miriam?), and we should just mention it once to make everybody aware.

pilleh commented 2 years ago

Cool. Yes, ignore this contig in coverage stats. And I agree - T2T-Y is more informative here anyway. I don't think this contig will play a role for others, but I included a few sentences to M&M.