lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
303 stars 69 forks source link

layout step produces empty gfa #18

Closed aechchiki closed 7 years ago

aechchiki commented 7 years ago

Hi guys.

I am using minimap/miniasm to get the raw contigs by rapid assembly as preliminary step in order to perform error correction using @isovic 's racon.

My input file is: r7_2d.fastq, a file containing 2D MinION reads (RNA, not DNA sequencing).

Following your instructions, I installed latest minimap (Version: 0.2-r124-dirty) & miniasm (Version: 0.2-r137-dirty). Overlap step was completed succesfully (and got a non-empty paf.gz file), but output for layout (gfa file) is empty.

Here is my log:

# overlap: successful.
./minimap -Sw5 -L100 -m0 -t8 r7_2d.fastq r7_2d.fastq | gzip -1 > r7_2d.paf.gz
[M::mm_idx_gen::8.484*1.20] collected minimizers
[M::mm_idx_gen::9.471*1.72] sorted minimizers
[M::main::9.471*1.72] loaded/built the index for 113249 target sequence(s)
[M::main] max occurrences of a minimizer to consider: 37
[M::main] Version: 0.2-r124-dirty
[M::main] CMD: /home/aechchik/wgets/minimap/minimap -Sw5 -L100 -m0 -t8 r7_2d.fastq r7_2d.fastq
[M::main] Real time: 17.636 sec; CPU: 60.304 sec

# layout: empty gfa file
./miniasm -f r7_2d.fastq r7_2d.paf.gz > r7_2d.gfa
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::1.021*0.99] read 664850 hits; stored 219221 hits and 16136 sequences (50720456 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::1.049*0.99] 11922 query sequences remain after sub
[M::ma_hit_cut::1.055*0.99] 206693 hits remain after cut
[M::ma_hit_flt::1.059*0.99] 206477 hits remain after filtering; crude coverage after filtering: 10.55
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::1.069*0.99] 11026 query sequences remain after sub
[M::ma_hit_cut::1.074*0.99] 202966 hits remain after cut
[M::ma_hit_chimeric::1.078*0.99] identified 1 chimeric reads
[M::ma_hit_contained::1.083*0.99] 756 sequences and 556 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 541 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 97 arcs
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 2 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 547 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.2-r137-dirty
[M::main] CMD: /home/aechchik/wgets/miniasm/miniasm -f r7_2d.fastq r7_2d.paf.gz
[M::main] Real time: 1.920 sec; CPU: 1.745 sec

Just to let you know, I tried the very same commands with the toy example on [PB data](wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz) (in your readme) and the gfa file is not empty.

Do you happen to know what is the source of this unexpected gfa file?

Thanks in advance for your help, Amina

lh3 commented 7 years ago

Miniasm assumes you are assembling genomic data. It is probably discarding RNA contigs because they are usually too short and/or composed of too few reads. You are entering an unexplored area.

Is it possible to share the reads or just the PAF file with me, so that I can tune the parameters a bit at least to get something (right or wrong) out?

aechchiki commented 7 years ago

Hi @lh3, thanks for your superquick reply! Sure I can share the reads/paf with you. (edit: I found your e-mail) I am sending them to lh3@me.com

Many thanks, Amina

lh3 commented 7 years ago

@aechchiki Thanks a lot for the test data. You can run minimap as usual, and invoke miniasm like this:

miniasm -2S6 -f r7_2d.fastq.gz mapping.paf > output.gfa

The output is not empty, but I have little idea whether that makes sense. I would be curious to know if such assembly is usable at all.

wjyzidane commented 5 years ago

I have the same issue and used "-2S6" option but does not work. Are there any other options I can try? Thanks!

abdullah314 commented 3 years ago

@aechchiki Thanks a lot for the test data. You can run minimap as usual, and invoke miniasm like this:

miniasm -2S6 -f r7_2d.fastq.gz mapping.paf > output.gfa

The output is not empty, but I have little idea whether that makes sense. I would be curious to know if such assembly is usable at all.

I can't find the option -2S6 in usage section of the software. Please explain '-2S6' .

HAHasani commented 2 years ago

yes pls, could you explain what this flag represents? in my case it did work and produced output

EmilioKolo commented 1 year ago

I had the same problem and it was apparently solved by the "-2S6" flag (I used it on a known sequence and it worked just fine for that). Still couldn't find any explanation of what it does in the documentation nor forums.