low coverage assembly suggestion

LipengKang commented 5 years ago

Hi, jue! I am assembling a ~4.9Gplant genome by 21X pacbio RSII data(N50 11.2kb and average length 8.1kb). I don't want to get a good assembly but many long nice contigs which can cover as many genes as possible. Four sets of parameters were chosen for test.

1.modified from new human genome paper(https://www.biorxiv.org/content/10.1101/519025v1) -p21 -S2 -s0.1 -A -e2 -k0 -K0.05 -L1000 high frequency kmer depth is set to 2421 Total kmers = 3264267439 average kmer depth = 8 434573080 low frequency kmers (<2) 197159 high frequency kmers (>2421) searched 114768 contigs TOT 3890086144, CNT 88154, AVG 44129, MAX 517632, N50 70144, L50 17095, N90 20992, L90 55815, Min 5120

2.Axolotl example -p 21 -S 2 --aln-noskip --rescue-low-cov-edges --tidy-reads 5000 high frequency kmer depth is set to 2303 Total kmers = 3227626081 average kmer depth = 7 480874106 low frequency kmers (<2) 191471 high frequency kmers (>2303) searched 124150 contigs TOT 3469020416, CNT 87905, AVG 39464, MAX 488704, N50 60416, L50 17721, N90 18944, L90 56896, Min 5120

3. -p21 -S2 -s0.05 -A -e2 -k0 -L1000 Total kmers = 3264267439 average kmer depth = 8 434573080 low frequency kmers (<2) 197159 high frequency kmers (>2421) searched 145100 contigs TOT 3882870784, CNT 93846, AVG 41375, MAX 554752, N50 67072, L50 17824, N90 19200, L90 58958, Min 5120

4. '-x rs -k 0 -p 21 -S 2 --aln-noskip --tidy-reads 1000 --edge-min 2 --rescue-low-cov-edges --no-read-clip --aln-dovetail -1' high frequency kmer depth is set to 2421 Total kmers = 3264267439 average kmer depth = 8 434573080 low frequency kmers (<2) 197159 high frequency kmers (>2421) searched 240685 contigs TOT 3069390848, CNT 137659, AVG 22298, MAX 286976, N50 30976, L50 30626, N90 10496, L90 96588, Min 5120

The second set of parameters is the same as Axolotl example. But the resulting contig is ten times shoter than it. Any advice?

Thanks, lipeng

ruanjue commented 5 years ago

Hi,

Have a try with more sensitive setting, like -p 20/19, it will take longer time, but might bring better contiguity with low coverage assembly.

Jue

LipengKang commented 5 years ago

Hi, Jue -p19 -S2 -A -e2 -L5000 has tested. high frequency kmer depth is set to 4325 Total kmers = 387403242 average kmer depth = 58 8504 low frequency kmers (<2) 111848 high frequency kmers (>4325) indexing 387282890 kmers, 22553153504 instances (at most) searched 156933 contigs TOT 3733065984, CNT 104569, AVG 35700, MAX 432128, N50 57600, L50 19804, N90 16384, L90 66398, Min 5120

It seems bigger kmer depth=58. but N50 is not longer compared with former ones! If this result more trust than before?

ruanjue commented 5 years ago

I am suggesting with caution to use --realign, which are more sensitive to find more similar sequences. It was introduced in wtdbg2 v2.3 but not fullly tested. In you case, the sequencing coverage is so low, --realign might help to find enough alignemnts to build FBG. Please be careful, it will take at least double or treble cpu time.

LipengKang commented 5 years ago

Hi,jue Following your suggestion, I add -R -p20 -S2 -A -e2 -L1000 -R -g 4.9g

Done, 51 nodes median node depth = 2 masked 0 high coverage nodes (>200 or <2) masked 0 repeat-like nodes by local subgraph analysis Done, 631 edges graph clean rescued 0 low cov edges deleted 0 binary edges deleted 51 isolated nodes cut 0 transitive edges ....... output 0 contigs

This combination cause 0 contigs but the kmers distribution are the same as former test. I make some error ?

ruanjue commented 5 years ago

Please give more log message, especially for the alignments.

LipengKang commented 5 years ago

Here are the whole log. wtdbg2.docx

ruanjue commented 5 years ago

Thanks, I will reply this issue when finish the debug.

ruanjue commented 5 years ago

Fixed it, https://github.com/ruanjue/wtdbg2/commit/141508f2bd91be6e0089b79eedeba12196708b77 . Please try again.

LipengKang commented 5 years ago

-R works well now. Thank you, jue!

ruanjue commented 5 years ago

Thanks for the information!

ruanjue / wtdbg2

low coverage assembly suggestion #93