marbl / CHM13

The complete sequence of a human genome
Other
883 stars 96 forks source link

Is there a downloadable GTF/GFF file? #28

Closed apredeus closed 3 years ago

apredeus commented 3 years ago

Hello T2T team,

Congratulations on the finished genome - this is an absolutely staggering achievement!

Is there a link to the annotation used in the paper in GTF or GFF format? I've found the UCSC browser page, but the gene tracks there are very numerous and it's not clear how to download them.

Thank you in advance!

Marynotmartha commented 3 years ago

Great. !!

Sent from my iPhone

On Jun 4, 2021, at 2:52 PM, Alex Predeus @.***> wrote:

 Hello T2T team,

Congratulations on the finished genome - this is an absolutely staggering achievement!

Is there a link to the annotation used in the paper in GTF or GFF format? I've found the UCSC browser page, but the gene tracks there are very numerous and it's not clear how to download them.

Thank you in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

skoren commented 3 years ago

The track you want is the CAT Genes + Liftoff V4 (http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=1121067105_Pel2UYQAuyd8gnHxlMCsp3E4I5Jr&db=hub_2395475_t2t-chm13-v1.0&c=chr10&g=hub_2395475_consensus_CHM13_V4), you can download bigbed files through the browser.

I added the gff files for both v1.0 and v1.1 to this page too for completeness.

apredeus commented 3 years ago

Thank you very much!

wjyzidane commented 3 years ago

Thanks for providing these files but I still can not find the download files

From this page : http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=1121067105_Pel2UYQAuyd8gnHxlMCsp3E4I5Jr&db=hub_2395475_t2t-chm13-v1.0&c=chr10&g=hub_2395475_consensus_CHM13_V4), it says "Data is downloadable from Globus in /team-genes/t2t-chm13-v1.0/CAT_V4/consensus_gene_set/"

But when I log in to the Globus and search this path. nothing is found:

image

Also, I wonder if the repeakMask for the v1.1 is also available for download? Thanks!

skoren commented 3 years ago

Ignore the globus link, that's internal to the T2T consortium. There are direct links to the GFF directly on the readme page: https://github.com/marbl/CHM13#downloads.

It is also available from the browser. If you click on the table schema link (http://genome.ucsc.edu/cgi-bin/hgTables?db=hub_2395475_t2t-chm13-v1.0&hgta_group=genes&hgta_track=hub_2395475_consensus_CHM13_V4&hgta_table=hub_2395475_consensus_CHM13_V4&hgta_doSchema=describe+table+schema), it will also give you the location of the backing data, namely: http://t2t.gi.ucsc.edu/chm13/hub/t2t-chm13-v1.0/CAT_V4/. You can see the gff file under the consensus_gene_set folder there.

wjyzidane commented 3 years ago

Thank you! This is very helpful.