yechengxi / DBG2OLC

A genome assembler that reduces the computational time of human genome assembly from 400,000 CPU hours to 2,000 CPU hours, utilizing long erroneous 3GS sequencing reads and short accurate NGS sequencing reads.
GNU General Public License v3.0
66 stars 27 forks source link

combining ReadsInfoFrom_* fails #32

Closed janvanoeveren closed 7 years ago

janvanoeveren commented 7 years ago

After running individual steps for subsets of fasta files I've combined the ReadsInfoFrom_*.fasta with: /opt/kgapps/DBG2OLC-20170411/DBG2OLC k 17 AdaptiveTh 0.004 KmerCovTh 2 MinOverlap 20 LD0 1 Contigs ../../../../Illumina_contigs/genome.contig.fasta RemoveChimera 1 LD 1 f cell10.fasta f cell11.fasta f cell12.fasta f cell13.fasta f cell14.fasta f cell15.fasta f cell16.fasta f cell17.fasta ... f cell8.fasta f cell9.fasta seemed to run well until it ends with:

...
total alignments: 8472039
Avg alignment size: 2
Avg sparse alignment size: 2
8644693 alignments calculated.
177 secs.
Loading non-contained sequences.
0 loaded.
frag sum: 508751022
offset sum: 192311552
Empty sequence loaded. It looks like you have messed up the data.
Assembly finished.

Can you help?

yechengxi commented 7 years ago

With this command the fasta files and the ReadsInfoFrom_*.fasta files should be located in the working directory. It looks like the fasta files are not though?

janvanoeveren commented 7 years ago

You mean the (raw) PacBio fasta files? Does it need those?

yechengxi commented 7 years ago

Yes they are needed as they are required to build the backbones.

On Sep 22, 2017, at 11:03 AM, janvanoeveren notifications@github.com wrote:

You mean the (raw) PacBio fasta files? Does it need those?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

yechengxi commented 7 years ago

I have clarified this in the updated manual. Thanks for letting me know.

pgaiero commented 5 years ago

Hi! I got the same error message when trying to use two long reads files, one from PacBio and another from MinION. Does DBG2OLC support both kinds of files? Here's what I did: nohup /DATOS/pgaiero/DBG2OLC_git/compiled/DBG2OLC k 17 AdaptiveTh 0.001 KmerCovTh 2 MinOverlap 20 RemoveChimera 1 Contigs /DATOS/pgaiero/Paspalum_umbrosum_denovo_Illumina/Illumina_assembly_PE_dedupe2/Contigs.txt f /DATOS/pgaiero/Paspalum_umbrosum_denovo_Illumina/Illumina_assembly_PE_dedupe2/Pumbrosum_subreads.fastq f /DATOS/pgaiero/Paspalum_umbrosum_denovo_Illumina/Illumina_assembly_PE_dedupe2/concatenated_MinION_pass.fq &

And here's the message I got: Loading contigs. Collecting information for consensus. 1317355 reads. Calculating reads overlaps. 1000000 reads aligned. Avg alignment size: 54 Avg sparse alignment size: 3 total alignments: 13815382 Avg alignment size: 54 Avg sparse alignment size: 3 13869759 alignments calculated. 1288 secs. Loading non-contained sequences. 191590 loaded. frag sum: 1072099954 offset sum: 347043943 Empty sequence loaded. It looks like you have messed up the data. Assembly finished.

I made sure that all files were in the working directory so the problem is not the same as in this issue. Would you mind pointing out what's wrong? Thanks a million for your help!