Closed LipengKang closed 5 years ago
Hi,
Have a try with more sensitive setting, like -p 20/19, it will take longer time, but might bring better contiguity with low coverage assembly.
Jue
Hi, Jue
-p19 -S2 -A -e2 -L5000
has tested.
high frequency kmer depth is set to 4325
Total kmers = 387403242
average kmer depth = 58
8504 low frequency kmers (<2)
111848 high frequency kmers (>4325)
indexing 387282890 kmers, 22553153504 instances (at most)
searched 156933 contigs
TOT 3733065984, CNT 104569, AVG 35700, MAX 432128, N50 57600, L50 19804, N90 16384, L90 66398, Min 5120
It seems bigger kmer depth=58. but N50 is not longer compared with former ones! If this result more trust than before?
I am suggesting with caution to use --realign
, which are more sensitive to find more similar sequences. It was introduced in wtdbg2 v2.3 but not fullly tested. In you case, the sequencing coverage is so low, --realign
might help to find enough alignemnts to build FBG. Please be careful, it will take at least double or treble cpu time.
Hi,jue
Following your suggestion, I add -R
-p20 -S2 -A -e2 -L1000 -R -g 4.9g
Done, 51 nodes median node depth = 2 masked 0 high coverage nodes (>200 or <2) masked 0 repeat-like nodes by local subgraph analysis Done, 631 edges graph clean rescued 0 low cov edges deleted 0 binary edges deleted 51 isolated nodes cut 0 transitive edges ....... output 0 contigs
This combination cause 0 contigs but the kmers distribution are the same as former test. I make some error ?
Please give more log message, especially for the alignments.
Here are the whole log. wtdbg2.docx
Thanks, I will reply this issue when finish the debug.
Fixed it, https://github.com/ruanjue/wtdbg2/commit/141508f2bd91be6e0089b79eedeba12196708b77 . Please try again.
-R works well now. Thank you, jue!
Thanks for the information!
Hi, jue! I am assembling a ~4.9Gplant genome by 21X pacbio RSII data(N50 11.2kb and average length 8.1kb). I don't want to get a good assembly but many long nice contigs which can cover as many genes as possible. Four sets of parameters were chosen for test.
1.modified from new human genome paper(https://www.biorxiv.org/content/10.1101/519025v1)
-p21 -S2 -s0.1 -A -e2 -k0 -K0.05 -L1000
high frequency kmer depth is set to 2421 Total kmers = 3264267439 average kmer depth = 8 434573080 low frequency kmers (<2) 197159 high frequency kmers (>2421) searched 114768 contigs TOT 3890086144, CNT 88154, AVG 44129, MAX 517632, N50 70144, L50 17095, N90 20992, L90 55815, Min 51202.Axolotl example
-p 21 -S 2 --aln-noskip --rescue-low-cov-edges --tidy-reads 5000
high frequency kmer depth is set to 2303 Total kmers = 3227626081 average kmer depth = 7 480874106 low frequency kmers (<2) 191471 high frequency kmers (>2303) searched 124150 contigs TOT 3469020416, CNT 87905, AVG 39464, MAX 488704, N50 60416, L50 17721, N90 18944, L90 56896, Min 51203.
-p21 -S2 -s0.05 -A -e2 -k0 -L1000
Total kmers = 3264267439 average kmer depth = 8 434573080 low frequency kmers (<2) 197159 high frequency kmers (>2421) searched 145100 contigs TOT 3882870784, CNT 93846, AVG 41375, MAX 554752, N50 67072, L50 17824, N90 19200, L90 58958, Min 51204. '-x rs -k 0 -p 21 -S 2 --aln-noskip --tidy-reads 1000 --edge-min 2 --rescue-low-cov-edges --no-read-clip --aln-dovetail -1' high frequency kmer depth is set to 2421 Total kmers = 3264267439 average kmer depth = 8 434573080 low frequency kmers (<2) 197159 high frequency kmers (>2421) searched 240685 contigs TOT 3069390848, CNT 137659, AVG 22298, MAX 286976, N50 30976, L50 30626, N90 10496, L90 96588, Min 5120
The second set of parameters is the same as Axolotl example. But the resulting contig is ten times shoter than it. Any advice?
Thanks, lipeng