yechengxi / DBG2OLC

A genome assembler that reduces the computational time of human genome assembly from 400,000 CPU hours to 2,000 CPU hours, utilizing long erroneous 3GS sequencing reads and short accurate NGS sequencing reads.
GNU General Public License v3.0
66 stars 27 forks source link

Consensus step - empty output #24

Closed s-yazar closed 7 years ago

s-yazar commented 7 years ago

Hi Chengxi,

Thank you for developing an easy-to-use hybrid assembler for large genomes.

When I run 'split_and_run_sparc.sh' script at the consensus step, both .m5 and .consensus.fasta files are created for each chunk and I am getting the following output message:

[INFO] 2017-05-02T16:27:28 [blasr] started. [INFO] 2017-05-02T16:27:29 [blasr] ended. For help: Sparc -h Backbone size: 181906 Empty ouput. Backbone copied.

[INFO] 2017-05-02T16:27:28 [blasr] started. [INFO] 2017-05-02T16:27:29 [blasr] ended. For help: Sparc -h Backbone size: 145661 Empty ouput. Backbone copied.

[INFO] 2017-05-02T16:27:28 [blasr] started. [INFO] 2017-05-02T16:27:28 [blasr] ended. For help: Sparc -h Backbone size: 59113 Empty ouput. Backbone copied.

Is this expected? I was wondering whether it could be because of my PacBio coverage which is only 5x.

Thank you for your help, Seyhan

yechengxi commented 7 years ago

Hi Seyhan, Yes, this is the case. By default the consensus part needs certain coverage to avoid the erroneous paths. If you really need to proceed with such low coverage, you will need to lower the 'c' parameter to 1.

So maybe try a command like this to see if the error disappears: Sparc b Backbone.fa m backbone.mapped.m5 k 1 g 1 c 1 t 0.1 o ConsensusOutput

s-yazar commented 7 years ago

Thank you, Chengxi. I set the parameters as you suggested and I think it worked. I got the following output message for 12 chunks I tested.

[INFO] 2017-05-04T11:59:56 [blasr] started. [INFO] 2017-05-04T11:59:57 [blasr] ended. For help: Sparc -h Backbone size: 116179 Finished.

Unfortunately, I have to work with what we have. It is going to be interesting to see how much improvement there is going to be.

I have another question. The blasr and sparc steps are set to iterate twice in the original script of 'split_and_run_sparc.sh'. I had to modify this script to run on our cluster so I removed the iteration. Do you think should I bring it back?

Thank you, Seyhan

yechengxi commented 7 years ago

I have never tested on so low coverage and I think it is possible to get slight improvements to bring it back.

s-yazar commented 7 years ago

Thank you. I will let you know how it goes