Closed danshu closed 8 years ago
2b. One to two times genome size is a typical size of unassembled. These are high error reads, reads that were trimmed poorly, 'contigs' that are almost entirely spanned by a single read, and probably other stuff. Some repeats might end up in here.
@brianwalenz Thanks for your detailed answers! So if I have 30x corrected reads, would you suggest trying canu assembly using the longest 25X? For the bubbles, I'm thinking about mering them into our assembly because they are just heterozygous sequences from a diploid genome. Canu should have recorded the sequences or reads of their neighbouring nodes?
Hi,
I have some more questions.
Thanks!
For your earlier question, give it all the reads you have.
Hi @brianwalenz As for your response to "3" -- what about issue #63 ?
Do the kmer histograms look sane? This is using a very simple method to find the expected hump in the kmer histogram. The peak of the hump is the reported coverage.
This coverage isn't used anywhere, but the histograms (and possibly the hump-finding method) are used to pick a kmer occurrence threshold for the overlapper used for trimming and unitigging. A higher threshold will use more repetitive seeds (and thus more CPU).
Thanks for the explanations. I don't have that data anymore to check the sanity of the kmer histograms. That issue has largely been addressed already (thanks again) - I was just thinking that there was at least 1 time on record where the Guessed Coverage was over-estimated (and genome size under-estimated). Nonetheless, as you point out, under-estimating coverage seems more likely for diploid genomes. I would imagine guessed coverage under-estimation gets worse as the heterozygosity rate of a diploid genome increases - is that a fair assumption?
what are the minimal hardware resources to run canu? what is the minimal amount of ram for a genome of estimated 4.9 Mbp?
eugenio, please don't hijack issues, your question is totally unrelated.
The smallest machine we have anymore is 16gb. It will probably run OK on 8gb. 4gb might be tight. CPU time depends on coverage. I test against ecoli and drosophila with 4-12 CPUs.
Dear All,
I'm new to pacbio assembly and I have several questions about canu which I tried but could not find any answers. I therefore post my questions here and any ideas about these questions are welcome.
Best, Quan