marbl / CHM13

The complete sequence of a human genome
Other
908 stars 98 forks source link

CCS reads using CHM13 assembly #23

Closed sjin09 closed 3 years ago

sjin09 commented 3 years ago

To whom it may concern,

I am hoping to use the CHM13 draft genome and the CCS reads to do some benchmarking of germline mutation calls. I was wondering which CCS reads were used for the draft v1.0 genome described in the post.

I found the following CCS reads in the SRA:

CHM13-CCS-20kb-m64062_190806_063919 CHM13-CCS-20kb-m64062_190803_042216 CHM13-CCS-15kb-m64062_190807_194840 CHM13-CCS-15kb-m64062_190804_172951 CHM13-CCS-11kb-m64015_190225_155953 CHM13-CCS-11kb-m64011_190228_190319 CHM13-CCS-11kb-m64015_190221_025712 CHM13-CCS-11kb-m64015_190224_013150

  1. The X chromosome described in the paper Telomere-to-telomere assembly of a complete human X chromosome seems to be assembled from ONT ultra-long reads and 70X CLR reads
  2. the draft genome described in the paper Improved assembly and variant detection of a haploid human genome using single‐molecule, high‐fidelity long reads is assembled from 11kb 24X sequence coveage CCS reads

I was not sure which HiFi reads were used for the draft v1.0 genome construction and I was wondering if someone could illuminate towards which HiFI reads that were not used in the assembly and the polishing process.

Best, Sangjin

skoren commented 3 years ago

The v1.0 assembly is not purely based on CCS reads. The initial string graph was built from the larger libraries (the first 4 in your list, these were also the ones used for the 20kb results in the HiCanu paper) but the graph was further resolved using ONT data. Polishing included calls from HiFi 20kb library/ONT and Illumina data.

sjin09 commented 3 years ago

Thank you @skoren, I will use the 11kb CCS reads for the benchmarking purposes!