ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

Polishing consensus failure, empty BAM file #144

Closed sjfleck closed 5 years ago

sjfleck commented 5 years ago

After running wtdbg2 and wtpoa-cns, I wanted to polish the assembly to increase the BUSCO score. I stuck very closely to the sample code, but every time I try to run the script, I get the same error message:

[M::mm_idx_gen::22.0391.74] collected minimizers [M::mm_idx_gen::24.3402.92] sorted minimizers [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "MY_SPECIES.dbg.bam".

For some reason, the resulting .bam files are always empty (0 B). I'm hoping that someone can help me fix this issue. I'm relatively new to bioinformatics, but I usually don't have so much trouble getting a script to run properly. Thank you in advance.

polish consensus, not necessary if you want to polish the assemblies using other tools minimap2 -t16 -ax map-pb -r2k dbg.raw.fa reads.fa.gz | samtools sort -@4 >dbg.bam samtools view -F0x900 dbg.bam | ./wtpoa-cns -t 16 -d dbg.raw.fa -i - -fo dbg.cns.fa

ruanjue commented 5 years ago

Please run wtdbg2.pl -T to generate the command lines. If anything wrong, please give more information, e.g.

sjfleck commented 5 years ago

Jue Ruan, Here are my different command lines: job #1 wtdbg2 -x ont -g 580m -i P030_Myspecies_cat.fastq.gz -t 32 -X 111 -fo wtdbg2_my_species

job #2 wtpoa-cns -t 32 -i wtdbg2_my_species.ctg.lay.gz -fo wtdbg2_my_species.dbg.raw.fa

job #3 minimap2 -t 32 -ax map-ont -a wtdbg2_my_species.dbg.raw.fa P030_Myspecies_cat.fastq.gz | samtools sort -@4 > wtdbg2_my_species.dbg.bam

samtools view wtdbg2_my_species.dbg.bam | wtpoa-cns -t 32 -d wtdbg2_my_species.dbg.raw.fa -i - -fo wtdbg2_my_species.dbg.cns.fasta

(I tried many other things, including: "minimap2 -t 32 -ax map-ont -a wtdbg2_my_species.dbg.raw.fa P030_Myspecies_cat.fastq.gz > wtdbg2_my_species.dbg.bam | samtools sort -@4 > wtdbg2_my_species.dbg.bam" but I keep getting similar errors)

Every time I run job #3, I get a similar error: [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "wtdbg2_my_species.dbg.bam".

Any help into what I'm doing wrong would be greatly appreciated.

ruanjue commented 5 years ago

First, check the log message from samtools sort, I guess the error is too many open files, if so, try samtools sort -m 4g.

sjfleck commented 5 years ago

Jue Ruan thanks for your help with this. I'm not sure why, but when I removed some of the pipes (|), it worked.

I divided your "Polish consensus, not necessary if you want to polish the assemblies using other tools" step into two parts. The weird thing is that I kept the pipes in the second part and it still worked fine.

Step 03a.

minimap2 -t 32 -ax map-ont wtdbg2_ping_moct.dbg.raw.fa P030_Myspecies_cat.fastq.gz -o wtdbg2_Myspecies.dbg.sam samtools view -b wtdbg2_Myspecies.dbg.sam -o wtdbg2_Myspecies.dbg.bam samtools sort -@32 -o wtdbg2_Myspecies.dbg.sort.bam wtdbg2_Myspecies.dbg.bam

Step 03b.

samtools view wtdbg2_Myspecies.dbg.sort.bam | wtpoa-cns -t 32 -d wtdbg2_Myspecies.dbg.raw.fa -i - -fo wtdbg2_Myspecies.dbg.cns.fasta

As I said, I'm not sure why it works, but I'm fine with it for now since I have a working version. Thanks again