marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
657 stars 179 forks source link

Running the trio-binning module on corrected ONT #2098

Closed GuillaumeHolley closed 2 years ago

GuillaumeHolley commented 2 years ago

Hi,

I currently have 70x of Illumina-corrected ONT reads (N50=22kb) with an estimated error rate of 1.6% across the genome. I also have Illumina reads (151bp PE) for the parents. As reported in this recent paper about the haplotype-resolved assembly of HG002, I would like to use the trio-binning module of Trio-Canu to separate my reads into both haplotypes. After browsing the documentation, I only found how to do the trio-binned assembly (which I assume performs the binning as one of the steps). I currently use the following command:

canu -p asm -d Sample genomeSize=3g useGrid=false -haplotype1 parent1.illumina.fastq  -haplotype2 parent2.illumina.fastq -nanopore-corrected offspring.ont_corr.fastq

I am running Canu 2.2.

Thank you for your help and time, Guillaume

skoren commented 2 years ago

Yes, you can run just trio-binning with the -haplotype option: https://canu.readthedocs.io/en/latest/tutorial.html?highlight=haplotype#canu-the-command

I wouldn't expect you need to change any parameters. We typically use defaults on HiFi or raw ONT reads so higher-accuracy ONT reads should work similarly. We typically do not use corrected reads because most algorithms are not phase-preserving so the reads end up mixing multiple haplotypes. Assuming this isn't the case for your reads, they should bin OK. You may want to confirm they are primarily one or the other haplotype in terms of their k-mers.