Closed wingwingWY closed 5 years ago
Depending on how repetitive the plant is, a NG50 in the hundreds of kb is not unreasonable. Most of your reads are also relatively short (under 10kb). You can see from the logs there are a large fraction of reads contained in large 20kb repeats.
That said, your reads also look to be lower quality than normal. The median error rate is 8%, assuming you ran with the default error rate of 12% you will likely be missing true overlaps. I would suggest running two round (or more) of correction on the data, inputing the corrected reads as raw data again (with the parameters -correct corMhapSensivity=normal corOutCoverage=80
). You could also try the latest base caller from nanopore to see if it improve the quality of the data. Then, for assembly don't use mhap for overlapping and use correctedErrorRate=0.15 ovlMerDistinct=0.975
. You may also want to take a look at the FAQ for heterozygous genomes in case your genome is not inbred.
Hi, I assembled a repetitive plant genome with promethION data by CANU 1.8. The genome size is about 1G and the input reads coverage is 90X . I got a 1.04G result with N50 118kb. How to improve the assembly? My command is :
Should I remove "overlapper=mhap utgReAlign=true" and add " correctedErrorRate=0.15 corMhapSensivity=normal corOutCoverage=80 corMaxEvidenceErate=0.15 " to my command?
This is the report file:
The unitigging/4-unitigger/001thr000.num000.log file is