rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
308 stars 28 forks source link

no output after racon initial polishing round #30

Closed fetyj closed 2 years ago

fetyj commented 2 years ago

Hello, I'm using trycycler to assembly a bacterial genome with pacbio data, for generating assembly I modified the miniasm_and_minipolish.sh script to adress the reads type (minimap2 -x ava-pb -t "$2" "$1" "$1" > "$overlaps" and minipolish --threads "$2" "$1" "$unpolished_assembly" --pacbio). Unfortunately, I have no output upon racon initial polishing round. Flye and raven assembly appear just fine. I hope you can help. Best, Fety

bash miniasm_and_minipolish.sh read_subsets/sample_02.fastq "$threads" > assembly_02.gfa && any2fasta assembly_02.gfa > assemblies/assembly_02.fasta
[M::mm_idx_gen::8.696*1.56] collected minimizers
[M::mm_idx_gen::9.465*2.42] sorted minimizers
[M::main::9.466*2.42] loaded/built the index for 22400 target sequence(s)
[M::mm_mapopt_update::10.049*2.34] mid_occ = 76
[M::mm_idx_stat] kmer size: 19; skip: 5; is_hpc: 1; #seq: 22400
[M::mm_idx_stat::10.372*2.30] distinct minimizers: 24073649 (78.86% are singletons); average occurrences: 1.968; average spacing: 4.304; total length: 203856207
[M::worker_pipeline::31.269*9.23] mapped 22400 sequences
[M::main] Version: 2.23-r1111
[M::main] CMD: minimap2 -x ava-pb -t 16 read_subsets/sample_02.fastq read_subsets/sample_02.fastq
[M::main] Real time: 31.380 sec; CPU: 288.693 sec; Peak RSS: 2.097 GB
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::2.686*1.00] read 2182721 hits; stored 3220936 hits and 22143 sequences (201659622 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::3.564*1.00] 21921 query sequences remain after sub
[M::ma_hit_cut::3.638*1.00] 3149337 hits remain after cut
[M::ma_hit_flt::3.706*1.00] 3050186 hits remain after filtering; crude coverage after filtering: 83.75
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::3.997*1.00] 21841 query sequences remain after sub
[M::ma_hit_cut::4.068*1.00] 3045274 hits remain after cut
[M::ma_hit_contained::4.147*1.00] 1038 sequences and 13666 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 12995 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 10508 arcs
[M::asg_arc_del_multi] removed 36 multi-arcs
[M::asg_arc_del_asymm] removed 177 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 56 tips
[M::asg_pop_bubble] popped 38 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 10 asymmetric arcs
[M::asg_arc_del_short] removed 32 short overlaps
[M::asg_cut_tip] cut 5 tips
[M::asg_pop_bubble] popped 7 bubbles and trimmed 0 tips
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 1 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.3-r179
[M::main] CMD: miniasm -f read_subsets/sample_02.fastq /tmp/tmp.x7U2Kr1O49.paf
[M::main] Real time: 4.875 sec; CPU: 4.876 sec

Checking requirements
    Minipolish requires Minimap2 and Racon to run, so it checks for these tools now.

Minimap2 found: /home/fiestaj/anaconda3/envs/trycycler/bin/minimap2 (v2.23-r1111)
Racon found:    /usr/local/bin/racon (v1.4.12)

Loading graph
    Loading the miniasm GFA graph into memory.

/tmp/tmp.wWrFV6CYBO.gfa
  4 segments (1,946,342 bp)
  8 links

Initial polishing round
    The first round of polishing is done on a per-segment basis and only uses reads
which are definitely associated with the segment (because the GFA indicated that they
were used to make the segment).

Running Racon on utg000001c:
  reads:      /tmp/tmp5vs645_s/utg000001c_reads.fastq (853 reads)
  input:      /tmp/tmp5vs645_s/utg000001c.fasta (1,923,351 bp)
  alignments: /tmp/tmp5vs645_s/utg000001c.paf (885 alignments)
  output:     /tmp/tmp5vs645_s/utg000001c_polished.fasta (0 bp)

Removing empty segment: utg000001c

Running Racon on utg000002c:
  reads:      /tmp/tmp5vs645_s/utg000002c_reads.fastq (3 reads)
  input:      /tmp/tmp5vs645_s/utg000002c.fasta (7,167 bp)
  alignments: /tmp/tmp5vs645_s/utg000002c.paf (18 alignments)
  output:     /tmp/tmp5vs645_s/utg000002c_polished.fasta (0 bp)

Removing empty segment: utg000002c

Running Racon on utg000003c:
  reads:      /tmp/tmp5vs645_s/utg000003c_reads.fastq (2 reads)
  input:      /tmp/tmp5vs645_s/utg000003c.fasta (12,955 bp)
  alignments: /tmp/tmp5vs645_s/utg000003c.paf (19 alignments)
  output:     /tmp/tmp5vs645_s/utg000003c_polished.fasta (0 bp)

Removing empty segment: utg000003c

Running Racon on utg000004c:
  reads:      /tmp/tmp5vs645_s/utg000004c_reads.fastq (2 reads)
  input:      /tmp/tmp5vs645_s/utg000004c.fasta (2,869 bp)
  alignments: /tmp/tmp5vs645_s/utg000004c.paf (33 alignments)
  output:     /tmp/tmp5vs645_s/utg000004c_polished.fasta (0 bp)

Removing empty segment: utg000004c

Full polishing rounds
    The assembly graph is now polished using all of the reads. Multiple rounds of
polishing are done, and circular contigs are rotated between rounds.

Running Racon on round_1:
  reads:      read_subsets/sample_02.fastq (22,400 reads)
  input:      /tmp/tmp5vs645_s/round_1.fasta (0 bp)
  alignments: /tmp/tmp5vs645_s/round_1.paf (0 alignments)
  output:     /tmp/tmp5vs645_s/round_1_polished.fasta (0 bp)

Running Racon on round_2:
  reads:      read_subsets/sample_02.fastq (22,400 reads)
  input:      /tmp/tmp5vs645_s/round_2.fasta (0 bp)
  alignments: /tmp/tmp5vs645_s/round_2.paf (0 alignments)
  output:     /tmp/tmp5vs645_s/round_2_polished.fasta (0 bp)

Assign read depths
    The reads are aligned to the contigs one final time to calculate read depth
values.

Aligning reads:
  reads:      read_subsets/sample_02.fastq (22,400 reads)
  contigs:    /tmp/tmp5vs645_s/depths.fasta (0 bp)
  alignments: /tmp/tmp5vs645_s/depths.paf (0 alignments)
  mean depth: 0.000x

This is any2fasta 0.4.2
Opening 'assembly_02.gfa'
ERROR: The input appears to be empty
rrwick commented 2 years ago

Does the unmodified miniasm_and_minipolish.sh produce output? I suspect PacBio reads will align just fine using the map-ont preset, so it may not matter.

Either way, you can always just skip the miniasm/Minipolish assemblies with Trycycler - maybe replace them with Redbean or NextDenovo/NextPolish assemblies. As long as you're giving Trycycler a nice variety of assemblies as input, it should work fine.

Ryan

fetyj commented 2 years ago

There is also no output with the unmodified miniasm_and_minipolish I'll give a try with nextDenovo / nextPolish Thank you for the tips :)

Fety

alexweisberg commented 3 months ago

For what its worth, I was getting identical errors using miniasm and minipolish, but with nanopore reads. What fixed it was updating to a more recent version of racon.