Closed UGYong closed 5 years ago
Thanks for the bug report. I received similar problems, but haven't found a proper test data to debug it.
There might be two causations:
1) pgzf fail to decompress the input data. Please try to decompress the
If not the above case, I will try to guess it in code.
Jue
Thank you for your response. I ran the wtpoa-cns after decompressing but it didn't work. For your second suggestion, I don't really know how to achieve it. Would you please show me how to do. The following is the log from wtpoa-cns.
^M1 contigs 100 edges 0 bases^M1 contigs 200 edges 0 bases^M1 contigs 300 edges 0 bases^M1 contigs 400 edges 0 bases^M1 contigs 500 edges 0 bases^M1 contigs 600 edges 0 bases^M1 contigs 700 edges 0 bases^M1 contigs 800 edges 0 bases^M1 contigs 900 edges 0 bases^M1 contigs 1000 edges 0 bases^M1 contigs 1100 edges 0 bases^M1 contigs 1200 edges 0 bases^M2 contigs 1300 edges 2274677 bases^M2 contigs 1400 edges 2274677 bases^M2 contigs 1500 edges 2274677 bases^M2 contigs 1600 edges 2274677 bases^M2 contigs 1700 edges 2274677 bases
1, find the last contig name in SH.ctg.fa, like tail -10000 | grep '^>' | tail -n 1
2, suppose the contig name is ctgX
, find it in SH.ctg.lay, like grep -n '^>ctgX' SH.ctg.lay
, let the line number be L1, then find the next following contig, be L2
3, cut lines from L1 to L2 - 1 from lay file into test.lay, then run wtpoa-cns on it
OK. I ran a consensus run for each contig based on your comment. Here's what I did.
I don't know why it happened and how it was fixed, but I hope this issue will be resolved for your next release.
Thank you, UG.
Thank you for the test. There, did 'one by one' mean run wtpoa-cns on each contig in seprated files? If so, please run them again and again for multilple times (>= 5), let us to see whether those contigs are prone to cause hung. Thanks in advance!
Jue
Yes, we ran 'one by one'. As you asked, I ran one contig (e.g. ctg1) for 5 times and the results were 1-0-1-0-0. 1 means successful and 0 is unsuccessful.
UG
Great, it is suitable for debuging. Could you send the lay file of ctg1 to me, ruanjue.big(AT)qq.com, your FTP or other way?
Ok. I want to know which email to send Google Drive.
UG
ruanjue@gmail.com
HI there @ruanjue, I was wondering if this issue was ever resolved. I just installed wtdbg2 v2.5 from the github, and I ran it on two datasets. It worked fine for one, but the other is hung exactly as described above. Is this expected in the latest version? And is there another way to fix it rather than running the command for each of the contigs? Thanks!
The latest commit of wtdbg2? Please see https://github.com/bioconda/bioconda-recipes/issues/24420
Yes, I had the issue after installing directly from Github, not from bioconda. I installed the bioconda version now and will give it a try.
Dear @ruanjue, just to follow up, I have been re-running the wtpoa-cns command now with the conda updated version, and again it seems hung up. On another assembly it ran very quickly, however on this one it has been running without any new output to the ctg.fa file for over 3 days. I should say this is a much more heterozygous genome than my other one, but I do not know if this behavior is expected in this case. Do you think it might be running properly? Thanks for your help!
I am afraid we need to debug it as https://github.com/ruanjue/wtdbg2/issues/71#issuecomment-465862172 and the fellowing.
Dear @ruanjue, I finally got around to debugging this. I selected the last contig like you described above, as well as another one that had already been successfully processed. I ran wtpoa-cns on the 'good' contig 5 times and it ran fine every time, taking 3-4 min each time. The problematic contig has less lines in the lay file; I expected it would get stuck, but it actually finished all five times, after 35 min each time.
Do you have any idea of what can be done here? Thanks in advance!
Let's divide the ctg lays file into 100 parts (https://github.com/ruanjue/wtdbg2/blob/master/scripts/split_seqs_3.pl), and run wtpoa-cns on those parts one by one. and repeat this procedure to locate the true problematic contig.
Thanks @ruanjue. Could you please clarify for the perl script, what are the two arguments? I figure that \<parts> in this case should be 100, but I'm not sure what to put as \<index> to have the script split into multiple files.
split_seqs_3.pl <parts> <index> <fasta_file>
split_seqs_3.pl 100 0 input.fa
means output the 1th seq per 100 seqs.
To get all parts, see the below
seq 0 99 | xargs -i echo split_seqs_3.pl 100 {} input.fa \> part{}.fa | sh
Thanks @ruanjue! The index 0 didn't output any contigs, I used 1 100 instead and now have all of them split up. I'm running wtpoa-cns in each file and will get back to you later.
Hi, I am performing genome assembly with PacBio Sequel reads. wtdbg2 ran ok. But, whenever I ran wtpoa-cns, the program was hung in the middle of run without any progress for 2-3 weeks. I tried with different number of threads and also tried by starting from scratch with reduced number of reads, but same thing happened for wtpoa-cns Plus, I did test runs for e.coli and a.thalania and it worked fine.
I am using wtdbg2-2.3 on redhat 6.
Thanks, UG