marbl / CHM13

The complete sequence of a human genome
882 stars 96 forks source link

Question regarding 41 protein coding genes identified #89

Closed singcell closed 5 months ago

singcell commented 10 months ago

I the recently published Nature paper about Y chromosome, it says "T2T-Y contains an additional 110 genes, among which 41 are predicted to be protein coding. The majority of these protein-coding genes (38 of 41) are additional copies of TSPY, one of the nine ampliconic gene families, filling the corresponding gap in GRCh38-Y (Table [1])".

Are those 41 genes identified in this paper novel or just confirmed the previous findings? Where can I find the names and sequences of these 41 protein coding genes supplementary data or in UCSC genome browser?

arangrhie commented 9 months ago


I called them 'additional copies' in the sense that they are not present in annotations of GRCh38-Y, because there was no sequence. It is hard to find an 1:1 relationship.

You can grep the below gene names from the 5.1 curated annotation:

gene_prtn   TSPY4   6
gene_prtn   TSPY2   1
gene_prtn   TSPY3   9
gene_prtn   LOC124903544    2
gene_prtn   TSPY1   1
gene_prtn   TSPY8   2
gene_prtn   TSPY9   4
gene_prtn   RBMY1A1 1
gene_prtn   TSPY10  15

The numbers on the right show the 'additional copies' found.

Best, Arang