Closed tay45 closed 4 years ago
May be contaminated with host genome, so that if selecting 50X reads, you get few virus reads. -g
is used to select -X 50
reads and estimate edge-cov-cutoff. Ignore it when the genome size is very small. Tuning -e 3
to a large value when you have too high sequence coverage.
Hello Jue,
Thank you for your comments!
Taehee
2020년 3월 30일 (월) 오후 8:11, Jue Ruan notifications@github.com님이 작성:
May be contaminated with host genome, so that if selecting 50X reads, you get few virus reads. -g is used to select -X 50 reads and estimate edge-cov-cutoff. Ignore it when the genome size is very small. Tuning -e 3 to a large value when you have too high sequence coverage.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ruanjue/wtdbg2/issues/183#issuecomment-606372425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYIACAN35ZO4DPFRHAUAVTRKFNNLANCNFSM4LXB74KA .
@ruanjue What wtdbg2 command do you recommend running for viral metagenomic samples with small genomes and low coverage?
Hello,
I am trying to assemble a viral genome (~ 190 kb). But, it has delivered 0 contig when I set '-g 190k' with a questionable warning message (" WARNNING: input file is not in gzip format "). But, my input was a *.fasta.gz.
The result has been the same until I increased the -g parameter to 500k. But, with that parameter, the contig was too short.
When I arbitrarily applied any big genome size (-g 4.6m), it provided a reasonable contig.
Do you have any comment regarding the '-g option' how to use? I attached my command and the log as below.
Thank you!
WTDBG: De novo assembler for long noisy sequences Author: Jue Ruan ruanjue@gmail.com Version: 2.5 (20190621) Usage: wtdbg2 [options] -i -o [reads.fa ...]
Options:
-i Long reads sequences file (REQUIRED; can be multiple), []
-o Prefix of output files (REQUIRED), []
-t Number of threads, 0 for all cores, [4]
-f Force to overwrite output files
-x Presets, comma delimited, []
preset1/rsII/rs: -p 21 -S 4 -s 0.05 -L 5000
preset2: -p 0 -k 15 -AS 2 -s 0.05 -L 5000
preset3: -p 19 -AS 2 -s 0.05 -L 5000
sequel/sq
nanopore/ont:
(genome size < 1G: preset2) -p 0 -k 15 -AS 2 -s 0.05 -L 5000
(genome size >= 1G: preset3) -p 19 -AS 2 -s 0.05 -L 5000
preset4/corrected/ccs: -p 21 -k 0 -AS 4 -K 0.05 -s 0.5
-g Approximate genome size (k/m/g suffix allowed) [0]
-X Choose the best depth from input reads(effective with -g) [50.0]
-L Choose the longest subread and drop reads shorter than (5000 recommended for PacBio) [0]
Negative integer indicate tidying read names too, e.g. -5000.
-k Kmer fsize, 0 <= k <= 23, [0]
-p Kmer psize, 0 <= p <= 23, [21]
k + p <= 25, seed is +
-K Filter high frequency kmers, maybe repetitive, [1000.05]
0 10000 20000 30000 40000 50000 60000 70000 77246 reads [Mon Mar 30 13:56:06 2020] filtering from 77246 reads (>=5000 bp), 876146697 bp. Try selecting 9500000 bp [Mon Mar 30 13:56:06 2020] Done, 343 reads (>=5000 bp), 9523712 bp, 37101 bins PROC_STAT(0) : real 5.211 sec, user 9.580 sec, sys 0.920 sec, maxrss 434764.0 kB, maxvsize 690776.0 kB [Mon Mar 30 13:56:06 2020] Set --edge-cov to 3 KEY PARAMETERS: -k 0 -p 21 -K 1000.049988 -S 4.000000 -s 0.050000 -g 190000 -X 50.000000 -e 3 -L 5000 [Mon Mar 30 13:56:06 2020] generating nodes, 16 threads [Mon Mar 30 13:56:06 2020] indexing bins[(0,37101)/37101] (9497856/866301696 bp), 16 threads [Mon Mar 30 13:56:06 2020] - scanning kmers (K0P21S4.00) from 37101 bins
** 1 - 201 ** Quatiles: 10% 20% 30% 40% 50% 60% 70% 80% 90% 95% 1 1 1 1 1 1 1 1 1 1 PROC_STAT(0) : real 5.512 sec, user 10.560 sec, sys 1.150 sec, maxrss 482640.0 kB, maxvsize 1816224.0 kB [Mon Mar 30 13:56:06 2020] - high frequency kmer depth is set to 1000 [Mon Mar 30 13:56:06 2020] - Total kmers = 1563366 [Mon Mar 30 13:56:06 2020] - average kmer depth = 2 [Mon Mar 30 13:56:06 2020] - 1560266 low frequency kmers (<2) [Mon Mar 30 13:56:06 2020] - 0 high frequency kmers (>1000) [Mon Mar 30 13:56:06 2020] - indexing 3100 kmers, 6614 instances (at most)
0 37101 bins [Mon Mar 30 13:56:06 2020] - indexed 3100 kmers, 6568 instances [Mon Mar 30 13:56:06 2020] - masked 35981 bins as closed [Mon Mar 30 13:56:06 2020] - sorting PROC_STAT(0) : real 5.512 sec, user 10.560 sec, sys 1.150 sec, maxrss 482640.0 kB, maxvsize 1816224.0 kB [Mon Mar 30 13:56:06 2020] Done
0|0 342 reads|total hits 0 PROC_STAT(0) : real 5.712 sec, user 11.980 sec, sys 1.230 sec, maxrss 484752.0 kB, maxvsize 1816224.0 kB [Mon Mar 30 13:56:06 2020] sorting rdhits ... Done [Mon Mar 30 13:56:06 2020] clipping ... 100.00% bases [Mon Mar 30 13:56:06 2020] generating regs ... 0 [Mon Mar 30 13:56:06 2020] sorting regs ... Done [Mon Mar 30 13:56:06 2020] generating intervals ... 0 intervals [Mon Mar 30 13:56:06 2020] selecting important intervals from 0 intervals [Mon Mar 30 13:56:06 2020] Intervals: kept 0, discarded 0 PROC_STAT(0) : real 5.712 sec, user 11.980 sec, sys 1.230 sec, maxrss 484752.0 kB, maxvsize 1816224.0 kB [Mon Mar 30 13:56:06 2020] Done, 0 nodes [Mon Mar 30 13:56:06 2020] output "hov3_p1.1.nodes". Done. [Mon Mar 30 13:56:06 2020] median node depth = 0 [Mon Mar 30 13:56:06 2020] masked 0 high coverage nodes (>200 or <3) [Mon Mar 30 13:56:06 2020] masked 0 repeat-like nodes by local subgraph analysis [Mon Mar 30 13:56:06 2020] generating edges [Mon Mar 30 13:56:06 2020] Done, 1 edges [Mon Mar 30 13:56:06 2020] output "hov3_p1.1.reads". Done. [Mon Mar 30 13:56:06 2020] output "hov3_p1.1.dot.gz". Done. [Mon Mar 30 13:56:06 2020] graph clean [Mon Mar 30 13:56:06 2020] rescued 0 low cov edges [Mon Mar 30 13:56:06 2020] deleted 0 binary edges [Mon Mar 30 13:56:06 2020] deleted 0 isolated nodes [Mon Mar 30 13:56:06 2020] cut 0 transitive edges [Mon Mar 30 13:56:06 2020] output "hov3_p1.2.dot.gz". Done. [Mon Mar 30 13:56:06 2020] deleted 0 isolated nodes [Mon Mar 30 13:56:06 2020] output "hov3_p1.3.dot.gz". Done. [Mon Mar 30 13:56:06 2020] cut 0 branching nodes [Mon Mar 30 13:56:06 2020] deleted 0 isolated nodes [Mon Mar 30 13:56:06 2020] building unitigs [Mon Mar 30 13:56:06 2020] [Mon Mar 30 13:56:06 2020] output "hov3_p1.frg.nodes". Done. [Mon Mar 30 13:56:06 2020] generating links [Mon Mar 30 13:56:06 2020] generated 1 links [Mon Mar 30 13:56:06 2020] output "hov3_p1.frg.dot.gz". Done. [Mon Mar 30 13:56:07 2020] rescue 0 weak links [Mon Mar 30 13:56:07 2020] deleted 0 binary links [Mon Mar 30 13:56:07 2020] cut 0 transitive links [Mon Mar 30 13:56:07 2020] remove 0 boomerangs [Mon Mar 30 13:56:07 2020] remove 0 weak branches [Mon Mar 30 13:56:07 2020] cut 0 tips [Mon Mar 30 13:56:07 2020] pop 0 bubbles [Mon Mar 30 13:56:07 2020] detached 0 repeat-associated paths [Mon Mar 30 13:56:07 2020] cut 0 tips [Mon Mar 30 13:56:07 2020] output "hov3_p1.ctg.dot.gz". Done. [Mon Mar 30 13:56:07 2020] building contigs [Mon Mar 30 13:56:07 2020] searched 0 contigs [Mon Mar 30 13:56:07 2020] Estimated: [Mon Mar 30 13:56:07 2020] output 0 contigs [Mon Mar 30 13:56:07 2020] Program Done PROC_STAT(TOTAL) : real 6.013 sec, user 12.030 sec, sys 1.340 sec, maxrss 502244.0 kB, maxvsize 1816224.0 kB
-- -- total memory 263847036.0 kB -- available 261383312.0 kB -- 28 cores -- Starting program: wtpoa-cns -t 16 -i hov3_p1.ctg.lay.gz -fo hov3_p1.ctg.fa -- pid 11525 -- date Mon Mar 30 13:56:07 2020
WARNNING: input file is not in gzip format
0 contigs 0 edges 0 bases PROC_STAT(TOTAL) : real 0.103 sec, user 0.000 sec, sys 0.020 sec, maxrss 9864.0 kB, maxvsize 1178476.0 kB