Closed PeterEmmrich closed 5 years ago
Maybe you can try '-p 19 -AS 2 -L 5000 -R --aln-dovetail -1 --drop-low-cov-edges'.
Thank you for the quick reply, we will try that! Best wishes, Peter
Thank you for the quick reply, we will try that! Best wishes, Peter
Have you noticed the additional 600 Mb of assembled sequence? This has impact on the N50/L50 values...
Yes, that is certainly an improvement and could account for some (or all?) of the drop in assembly N50, but I was hoping for a better result, given that we were putting in an extra 50% of sequencing data, with a higher average read N50.
Anyway, the assembly is still running with the parameters Jue suggested (after a hickup at the HPC) and I'll give an update once it's done.
On Wed, 13 Nov 2019 16:40 Marco, notifications@github.com wrote:
Thank you for the quick reply, we will try that! Best wishes, Peter
Have you noticed the additional 600 Mb of assembled sequence? This has impact on the N50/L50 values...
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ruanjue/wtdbg2/issues/149?email_source=notifications&email_token=ABJEI7KGJVXQYBN6IOVUFA3QTP7TFA5CNFSM4JGG2OB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6FQCQ#issuecomment-553408522, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJEI7PVV5OFOZYTJ4HEYADQTP7TFANCNFSM4JGG2OBQ .
The inflation of genome size was mostly caused by fragmental alignments, which can be discarded by -l 4096
or even larger -l 8192
.
Hey, we've got a new assembly now, which is looking much better! The total assembly size has gone down again, but the largest contig and the N50 have doubled compared to the first assembly and the L50 is down accordingly. It looks like we still have quite a <10k few fragment contigs, but we'll see what we can do from here by merging with other assemblies. Many thanks for your advice!
Assembly lsa3.ctg
contigs (>= 0 bp) 162994
contigs (>= 1000 bp) 162947
contigs (>= 5000 bp) 138511
contigs (>= 10000 bp) 85715
contigs (>= 25000 bp) 43848
contigs (>= 50000 bp) 23711
Total length (>= 0 bp) 6223992364
Total length (>= 1000 bp) 6223958958
Total length (>= 5000 bp) 6120814110
Total length (>= 10000 bp) 5751743953
Total length (>= 25000 bp) 5090391365
Total length (>= 50000 bp) 4383480177
contigs 162985
Largest contig 2768903
Total length 6223989194
GC (%) 38.66
N50 155574
N75 38461
L50 8679
L75 30223
@PeterEmmrich Hi Peter, can you share your "final" CLI argument for the wtdbg2 assembler that you were most happy with? Thx
Hello! We are trying to put together a draft of a large plant genome from Nanopore data, but weirdly, putting in more coverage doesn't seem to improve the assembly!?
The first run we did with a PromethION dataset with this read length distribution, amounting to only about 20X: 3FC_distribution_stacked_100b.pdf
using these parameters: -x ont -g 8.2g -t 16 -S 2 --edge-min 2 --rescue-low-cov-edges
and got these initial results, which I was quite happy with for a first try
contigs (>= 0 bp) 163460 contigs (>= 10000 bp) 115478 contigs (>= 50000 bp) 31834 Largest contig 1195731 Total length 6108463310 N50 78873 L50 19080
We then got some more sequencing data, adding another 10X, most of it in 20kb+ reads whole set read lengths_stacked.pdf
and ran the assembly again using similar parameters (but more threads) -x preset3 -g 8.2g -t 112 --tidy-reads 2000 --edge-min 2 --rescue-low-cov-edges
but we got these results, which don't seem significantly better than the first round - a larger assembly, but in many more contigs and with a smaller largest contig and lower N50.
contigs (>= 0 bp) 208502 contigs (>= 10000 bp) 141093 contigs (>= 50000 bp) 34069 Largest contig 1170373 Total length 6710115277 N50 62988 L50 25731
Could you please advise on how we could improve this? Are there any parameters you would set differently given our dataset?
Many thanks, Peter