Closed AnthonyPiot91 closed 1 year ago
The distribution is fine. Canu tries to correct the longest 40x of reads for assembly. Since you have a lot of coverage, it's able to use only the long reads over 12kb for this part. You can see this in the correction report where the corrected read median is expected to be 17kb vs 3kb for all input data. The short reads are "rescued", these are corrected if the longest 40x doesn't represent them well and so likely come from shorter plasmids in your sample.
Thanks for the information.
This answers my second concern. If I understand correctly, I should not worry about canu selecting only the longest reads, not representing small plasmids (smaller than 10Kbp), because "rescued" reads will likely represent these sequences?
Yes, it should though it's not always perfect. When you have your assembly you can re-map the raw data to see if there are reads w/o good explanation/mappings to see if anything is missed.
Very good, thanks a lot!
Hello,
I am trying to assemble a small but repetitive bacterial genome with numerous linear and circular plasmids using Oxford nanopore long reads.
While using canu, I'm concerned about the read length distribution of the corrected reads. The distribution is bimodal with very few reads between 5'000 and 12'000bp. This does not reflect the distribution of the original raw reads.
I don't really know what could cause this drop in the read length distribution. Is it something I should worry about ? If so what could I do to improve the assembly ?
Command used : canu -p $STRAIN \ -d $OUTPUT_DIR/"$STRAIN" \ genomeSize=1.4m \ maxThreads=16 \ useGrid=false \ -nanopore $INPUT_FILE_PATH
Version: canu 2.2 on computing server
Here is canu's report, thanks for your help.