marbl / CHM13

The complete sequence of a human genome
Other
883 stars 96 forks source link

canu multi platform assembly parameters #8

Closed changhan1110 closed 4 years ago

changhan1110 commented 4 years ago

Hi,

The bioRxiv paper says Canu was used for both data types (Nanopore and PacBio).

It seems that authors run Canu with combination of PB and ONT with the parameters (genomeSize=3.1g corMhapSensitivity=normal ovlMerThreshold=500 correctedErrorRate=0.085 trimReadsCoverage=2 trimReadsOverlap=500 -pacbio-raw)

Why didn't they run canu with multi platform parameter (-pacbio-raw -ont-raw)?

If I have a misunderstanding, please let me know.

Thanks a lot, Changhan

skoren commented 4 years ago

Yes, we did not use the -nanopore-raw option because the majority of the data was pacbio and the corrected reads were higher quality than from a nanopore-only assembly. Thus, we didn't need to increase the error tolerances during correction as we typically do for nanopore assemblies. The error rate you can see above was increased over the pacbio default of 4.5%.

The rel2 and rel3 canu assemblies used the default parameters and the -nanopore-raw options as these did not include any pacbio data.

changhan1110 commented 4 years ago

Thank you for explaining, Koren.

changhan1110 commented 4 years ago

Can I ask you another question? How long did the Canu run take?

skoren commented 4 years ago

We didn't time it precisely but typically for this type of coverage and data Canu takes about 150k cpu hours or about a week on our cluster.