Closed Lordhooze closed 4 years ago
Try aligning the core CDS against the assembly to see whether they were missed in the assembly, or not well-polished.
Ok, many thanks, I will do that.
Dear Dr. Ruan,
Is there an explanation/speculation on why the assembly size would be smaller using ont data? For our genome, Illumina data always produce an assembly much smaller than the genome size estimate (2/3), we suspected the problem to be collapsed repeats. Thank you! Rongfeng Cui
I haven't found the exact reason. Might be: 1, missing in alignments (false negative) 2, collapsed tandom repeats 3, collapsed long repeats but not be untangled
PS: Not only for ONT, but worse.
Dear Dr. Ruan, Thank you for the quick reply. What sequencing coverage will be needed by wtdbg2 until there is not much improvement? Would the performance be better if I first correct the input reads by 2nd generation data (say, with HALC)?
50X is ok, more than 80X should improve less. wtdbg2 was designed to handle with raw long noisy reads. If you already correct raw reads with CANU or other long reads self-correction tools, the results may be better, but not sure. I don't think long reads can be correctly corrected by NGS data. In fact, I also don't think long reads correction (excepting HiFi-reads) is correct. However, long reads correction looks very good.
Thanks, I will experiment with different approaches.
Another question, how about contigs assembled from NGS data, which are in general a few kb long? Do you think they would be useful for correction? I imagine that repeats cannot be corrected with this approach, but unique fractions of the genome could be?
Why not to use the NGS contig to correct TGS contigs instead of correcting long reads.
Good idea, I will try that. Thanks!
Dear Dr. Ruan, I noticed that there's this option in wtdbg2: --err-free-nodes Select nodes from error-free-sequences only. E.g. you have contigs assembled from NGS-WGS reads, and long noisy reads. You can type '--err-free-seq your_ctg.fa --input your_long_reads.fa --err-free-nodes' to perform assembly somehow act as long-reads scaffolding
Is this designed for performing scaffolding using long reads if I input NGS contigs? Will it also fill in gaps if they are fillable by TGS reads? This seems to be a very nice option.
--err-free-nodes
is combined with --err-free-seq
. However, I have't supported them for a long time. I forgot to remove --err-free-nodes
. If more available time, I will retrieve this function.
Ah I see. I tried just now to input these parameters but the latest version of wtdbg2 doesn't seem to recognize them any more (just printing the help information).
Rongfeng
Hi, ruanjue: I note that , for nanopore data, wtdbg2 may produce an assembly smaller than the true genome.
how to solve this problem.
three month ago, I assembled a genome using wtdbg2 (80 coverage). Contig N50 is very good.
However, after annotation, the busco score is only 90.
I guess, Wtdbg2 may miss some area of the genome, which lead to the low busco score