marbl / CHM13

The complete sequence of a human genome
Other
883 stars 96 forks source link

Is your CHM13TERT the same as the CHM13 used in PacBio paper? #7

Closed mw55309 closed 4 years ago

mw55309 commented 4 years ago

Names of cell lines are confusing.

You say this data is generated from CH13TERT cell line, do you know if this is the same CHM13 from this paper (https://www.biorxiv.org/content/10.1101/635037v3.full) and the referenced assembly https://www.ncbi.nlm.nih.gov/assembly/GCA_002884485.1/??

skoren commented 4 years ago

Yes, this is the same cell line. The PacBio data links in the README go to the PacBio CCS data from the paper you listed as well as the CLR data used to generate the linked asm.

However, as listed on the README, only the NHGRI line was karyotyped to ensure stability and proper copy number. As with all cell lines, it is possible there are some differences between the cells grown/sequenced in one location vs another.

mw55309 commented 4 years ago

Thanks Serge!

I note v0.7 of the assembly includes both nanopore and pacbio data.

When was the last nanopore only assembly of the data? (pre-correction)

skoren commented 4 years ago

All the rel assemblies are just nanopore data w/o any polishing after the assembler. So it's just the output of Canu or Flye.

mw55309 commented 4 years ago

Sorry Serge I am not trying to be deliberately dumb..

README says "The current assembly draft (v0.7) is generated with Canu v1.7.1 including rel1 data up to 2018/11/15 and incorporating the previously released PacBio data"

What was the last assembly that didn't include PacBio data? And at what stage was the pacbio data "incorporated"?

skoren commented 4 years ago

All the v0. versions were based on the same initial Canu assembly, which used PacBio data from the start (both nanopore and pacbio reads were given to canu for the run). The numeric revisions since then were manual changes to resolve remaining gaps and the centromere. So all the posted v0. will have included pacbio data.

The only assemblies that did not use any pacbio data are the rel2 and rel3 assemblies posted, those relied purely on nanopore and were not polished in any way.