wtpoa-cns was hung - Githubissues

UGYong commented 5 years ago

Hi, I am performing genome assembly with PacBio Sequel reads. wtdbg2 ran ok. But, whenever I ran wtpoa-cns, the program was hung in the middle of run without any progress for 2-3 weeks. I tried with different number of threads and also tried by starting from scratch with reduced number of reads, but same thing happened for wtpoa-cns Plus, I did test runs for e.coli and a.thalania and it worked fine.

I am using wtdbg2-2.3 on redhat 6.

Thanks, UG

ruanjue commented 5 years ago

Thanks for the bug report. I received similar problems, but haven't found a proper test data to debug it.

There might be two causations: 1) pgzf fail to decompress the input data. Please try to decompress the .ctg.lay.gz into .ctg.lay and run wtpoa-cns on new file. 2) bug inside wtpoa-cns's algorithm. Please find which contig was stuck, the one just after outputed contigs, then locate it in .ctg.lay file, only run wtpoa-cns just from that contig. If it is stuck again at this contig, please send the layout of htis contig to me, it will be ultra-important for me to fix the bug.

If not the above case, I will try to guess it in code.

Jue

UGYong commented 5 years ago

Thank you for your response. I ran the wtpoa-cns after decompressing but it didn't work. For your second suggestion, I don't really know how to achieve it. Would you please show me how to do. The following is the log from wtpoa-cns.

-- -- total memory 9531305140.0 kB -- available 9293307916.0 kB -- 384 cores -- Starting program: wtpoa-cns -t 64 -i SH.ctg.lay -fo SH.ctg.fa -- pid 292569 -- date Thu Feb 21 13:21:22 2019

^M1 contigs 100 edges 0 bases^M1 contigs 200 edges 0 bases^M1 contigs 300 edges 0 bases^M1 contigs 400 edges 0 bases^M1 contigs 500 edges 0 bases^M1 contigs 600 edges 0 bases^M1 contigs 700 edges 0 bases^M1 contigs 800 edges 0 bases^M1 contigs 900 edges 0 bases^M1 contigs 1000 edges 0 bases^M1 contigs 1100 edges 0 bases^M1 contigs 1200 edges 0 bases^M2 contigs 1300 edges 2274677 bases^M2 contigs 1400 edges 2274677 bases^M2 contigs 1500 edges 2274677 bases^M2 contigs 1600 edges 2274677 bases^M2 contigs 1700 edges 2274677 bases

ruanjue commented 5 years ago

1, find the last contig name in SH.ctg.fa, like tail -10000 | grep '^>' | tail -n 1 2, suppose the contig name is ctgX, find it in SH.ctg.lay, like grep -n '^>ctgX' SH.ctg.lay, let the line number be L1, then find the next following contig, be L2 3, cut lines from L1 to L2 - 1 from lay file into test.lay, then run wtpoa-cns on it

UGYong commented 5 years ago

OK. I ran a consensus run for each contig based on your comment. Here's what I did.

I split lay file for each contig.
I generated consensus command for each contig
Then, I ran 4 jobs concurrently, where each job uses 32 threads.
Then, here's how it went. For total of 3834 contigs, 14 contigs were hung. I killed the 14 runs and resumed one by one. This time, one job was hung again and retried. Finally, consensus calls for all contigs were successful.

I don't know why it happened and how it was fixed, but I hope this issue will be resolved for your next release.

Thank you, UG.

ruanjue commented 5 years ago

Thank you for the test. There, did 'one by one' mean run wtpoa-cns on each contig in seprated files? If so, please run them again and again for multilple times (>= 5), let us to see whether those contigs are prone to cause hung. Thanks in advance!

Jue

UGYong commented 5 years ago

Yes, we ran 'one by one'. As you asked, I ran one contig (e.g. ctg1) for 5 times and the results were 1-0-1-0-0. 1 means successful and 0 is unsuccessful.

UG

ruanjue commented 5 years ago

Great, it is suitable for debuging. Could you send the lay file of ctg1 to me, ruanjue.big(AT)qq.com, your FTP or other way?

UGYong commented 5 years ago

Ok. I want to know which email to send Google Drive.

UG

ruanjue commented 5 years ago

ruanjue@gmail.com