human-pangenomics / HPP_Year1_Assemblies

Assemblies from HPP Year 1 production
64 stars 8 forks source link

Missing annotations #4

Open mozack opened 1 year ago

mozack commented 1 year ago

Hi,

Thank you for this fantastic resource!

The CAT genes index does not appear to have annotation entries for 3 samples: HG002 HG005 NA19240

https://github.com/human-pangenomics/HPP_Year1_Assemblies/blob/main/annotation_index/Year1_assemblies_v2_genbank_CAT_genes.index

Are the gene annotations for these 3 samples available elsewhere?

Thanks!

wwliao commented 1 year ago

The CAT pipeline was dependent on the Minigraph-Cactus graph, resulting in its applicability to only 44 samples (HG002, HG005, NA19240 were set aside to facilitate their use in benchmarking). Conversely, the Ensembl pipeline should include gene annotations for all 47 samples. The link to access the Ensembl gene annotations is: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=submissions/8E6C4ACC-FEA9-4DD8-94A3-B92234206F95--Y1_ENSEMBL_V1/

@mhaukness-ucsc, could you please check if the above link is the version used in the HPRC marker paper?

@juklucas, in your opinion, should we consider providing an index file for the Ensembl gene annotations as well?

mozack commented 1 year ago

Thanks so much! I see the Ensembl annotations and will try them out.

diekhans commented 1 year ago

The above link should be correct for CAT for comparisons to marker paper results; however Ensembl should be used for new analysis.