Closed zincuum closed 5 months ago
Hi @zincuum,
Sorry, I am confused - which FTP are you referring to? GCA_009914755.4 is the GenBank accession for the submitted assembly.
Hi @zincuum,
Sorry, I am confused - which FTP are you referring to? GCA_009914755.4 is the GenBank accession for the submitted assembly.
In the attached captured photo, I am talking about the dragged FTP. The link for this is as follows: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/variation/2022_10/vcf/
I believe the data in that Ensembl FTP were created from the GRCh38-based HAL file, downloaded from the Minigraph-Cactus alignment of Liao et al. 2022 (Liao et al. 2022). The rest of the 'lifted' data listed in this github were generated with using a curated chain file between GRCh38 and CHM13 alignments.
Thanks for your reply. Could you please confirm if I understood correctly? What you mean is that both files are created by lifting over GRCH38, but the difference is that Ensembl FTP uses a grch38 file created by Minigraph-Cactus alignment as input, while the one on github uses a grch38 file curated by NCBI?
Yes, I confirmed with the Esembl team. The ClinVar was lifted over from GRCh38 using the Minigraph-Cactus alignment as done for the gnomAD data. The 1KGP and SGDP dataset in the FTP are identical datasets as from the Y paper.. See supplementary Notes for more details. Sorry for the confusion.
Dear T2T-CHM13 Teams
Thank you for making this amazing set of resources available. I have a question about the naming convention for the files you provide. If you go to the download section of T2T github, there are files related to vcf calls. Here, ClinVar20220313, which liftover the GRCh38 file, is on the main screen, and if you download it, it is the same as chm13v2.0_ClinVar20220313.vcf. And when you enter FTP, there is another T2T-CHM13 Clinvar file, Homo_sapiens-GCA_009914755.4-2022_10-clinvar.vcf. I want to know the difference between these two files.