Closed fredzhou91 closed 4 years ago
--ctg-min-length <int>
Min length of contigs to be output, 5000
--ctg-min-nodes <int>
Min num of nodes in a contig to be ouput, 3
YYY is post-filtered number. Too short contigs are mostly duplicates caused by high error rate.
Oh I see, there are series of --load parameters
.
Thank you!!! I''ll give a try.
Best, Fred
Hi,
I'm trying to re-do the analsyis by using the --load
function
However, after I tried both loading nodes+clips, or loading the alignments, all reporting the errors for cannot find reads
and ended up with empty output
-- Cannot find read "m54084_170906_112213/56164781/ccs" in LINE:2 in load_nodes_graph -- wtdbg.h:2169 --
-- Cannot find read "m54084_170627_173617/7602327/ccs" in LINE:3 in load_nodes_graph -- wtdbg.h:2169 --
-- Cannot find read "m54084_170904_090439/33423798/ccs" in LINE:4 in load_nodes_graph -- wtdbg.h:2169 --
-- Cannot find read "m54084_170728_181802/74908118/ccs" in LINE:5 in load_nodes_graph -- wtdbg.h:2169 --
-- Cannot find read "m54084_170727_075635/41091360/ccs" in LINE:6 in load_nodes_graph -- wtdbg.h:2169 --
-- Cannot find read "m54084_170630_060345/55509454/ccs" in LINE:7 in load_nodes_graph -- wtdbg.h:2169 --
-- Cannot find read "m54084_170905_152245/25625500/ccs" in LINE:8 in load_nodes_graph -- wtdbg.h:2169 --
-- Cannot find read "m54084_170519_032703/42730466/ccs" in LINE:9 in load_nodes_graph -- wtdbg.h:2169 --
-- Cannot find read "m54084_170904_090439/38404181/ccs" in LINE:10 in load_nodes_graph -- wtdbg.h:2169 --
Please suggest whether there's any issue in the read names. Thank you very much!
Best, Fred
To load alignments, you MUST make sure the input reads are identidical to previous run. That is keeping the same -i
--limit-input
-L
--tidy-name
--rdname-filter
--rdname-includeonly
-X
-g
--rdcov-filter
.
To load alignments, you MUST make sure the input reads are identidical to previous run. That is keeping the same
-i
--limit-input
-L
--tidy-name
--rdname-filter
--rdname-includeonly
-X
-g
--rdcov-filter
.
Dear Jue,
I followed your suggestion and still get empty results and the warnings:
-- Cannot find read "m54084_170519_032703/31130428/ccs" in LINE:1 in load_alignments_core -- wtdbg.h:1250 --
-- Cannot find read "m54084_170519_032703/31130428/ccs" in LINE:2 in load_alignments_core -- wtdbg.h:1250 --
-- Cannot find read "m54084_170519_032703/31130428/ccs" in LINE:3 in load_alignments_core -- wtdbg.h:1250 --
-- Cannot find read "m54084_170519_032703/31130428/ccs" in LINE:4 in load_alignments_core -- wtdbg.h:1250 --
Please find the code I used for previous run:
wtdbg2 \
-x sq \
-g 3g \
-i MY_ZIPPED_FASTA.gz \
-t 79 \
-p 17 \
-S 2 \
--edge-min 2 \
--rescue-low-cov-edges \
-R \
-o THE_OUTPUT \
> THE_OUTPUT.log 2>&1
I backup all output into another folder, and execute the following code to the same location where all the previous output files are still available.
1. --load-alignemnts
2. --ctg-min-length
3. --ctg-min-nodes
And here comes my code:
wtdbg2 \
-x sq \
-g 3g \
-i MY_ZIPPED_FASTA.gz \
-t 79 \
-p 17 \
-S 2 \
--edge-min 2 \
--rescue-low-cov-edges \
--load-alignments THE_.alignments.gz_produced_in_previous_run \
-R \
-o THE_SAME_OUTPUT \
--ctg-min-length 500 \
--ctg-min-nodes 1 \
> THE_NEW_OUTPUT.log 2>&1
And I compared the key parameters in the two runs, they are exactly the same:
Previous run:
-k 0 -p 17 -K 1000.049988 -A -S 2.000000 -s 0.050000 -g 3000000000 -X 50.000000 -e 2 -L 5000
Re run:
-k 0 -p 17 -K 1000.049988 -A -S 2.000000 -s 0.050000 -g 3000000000 -X 50.000000 -e 2 -L 5000
My only guess is, whether the "/" in the reads will affect the searching?
Thank you very much!
Best, Fred
wtdbg2 parses read name with regex '^(.+?)/[0-9]+_[0-9]+$' , if matched, it select the longest subread and discard the subreads's coordiates. So, please show a few reads' name from input sequences file and name in alignments.
Dear Jue,
Sorry!!!! I re-examine the files while found that the original read file has been modified.
I fixed the files and seems that everything is ok now!
It is really a fascinating tool! I will continue my work! Thank you very much!!!
Best, Fred
Dear Jue Ruan,
I've finished the wtdbg2 and the results turned out to be good (judged by total size also N50).
However, I noticed that there are unitigs and contigs, as in the log there are lines starting with
TOT
after the unitigs, andEstimated: TOT
after the contigsAlso, I noticed the number of contigs are dropping, and seemed that the
Estimated: TOT
is not corresponding to the exact contigs that survived in the ctg.lay.gz.YYY is smaller than XXX.
So I'm wondering whether it is possible to generate the lay.gz by taking the frg.dot.gz or 0.X.dot.gz, as I noticed there is a script file termed wtdbg-dot2gfa.pl, which seems to convert the dot files to GFA format.
(My guessing would be, there would be a script that can generates ctg.lay.gz by taking the ctg.dot.gz and information from other files, by which I may rescue some data)
Thank you very much!
Best, Fred