xfengnefx / hifiasm-meta

hifiasm_meta - de novo metagenome assembler, based on hifiasm, a haplotype-resolved de novo assembler for PacBio Hifi reads.
MIT License
60 stars 8 forks source link

Segmentation fault with conda installed 0.3-r063.2 (hifiasm code base 0.13-r308) #26

Closed nallsing-salk closed 1 year ago

nallsing-salk commented 1 year ago

Hello,

I am trying to assemble metagenomic reads from public SRA accessions using conda installed hifiasm_meta 0.3-r063.2 and I am often receiving a segmentation fault.

For example, I tried to assemble accession SRR13392911 - "human gut metagenome sequencing" - and received this fault after the majority of the program ran, generating 5 ha_hist_line plots and 1 hist_readlength plot. It is able to write the .bin files but seems to fail at writing the .gfa files.

Writing reads to disk... 
wrote cmd of length 254: hamt version=0.3-r063.2, ha base version=0.13-r308, CMD= hifiasm_meta -o SRR13392911.v1.hifiasm_meta/SRR13392911.v1.hifiasm_meta -t 64 D1181.fastq.gz
Bin file was created on Tue Feb  7 23:26:52 2023
Hifiasm_meta 0.3-r063.2 (hifiasm code base 0.13-r308).
Reads has been written.
[hamt::write_All_reads] Writing per-read coverage info...
[hamt::write_All_reads] Finished writing.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
bin files have been written.
[M::hamt_clean_graph] no debug gfa
[debug::hamt_normalize_ma_hit_t_single_side_advance] nb_batch: 2093
[M::hamt_normalize_ma_hit_t_single_side_advance] typeA 335 B 339, used 370.1s

[debug::hamt_normalize_ma_hit_t_single_side_advance] nb_batch: 2093
[M::hamt_normalize_ma_hit_t_single_side_advance] typeA 0 B 25, used 334.7s

[M::clean_weak_ma_hit_t] treated 0, used 0.1s
[M::ma_hit_sub] remained 8572776, deleted 0, used 0.1s
[M::detect_chimeric_reads_conservative] n_simple_remove: 0, n_complex_remove: 0/0, used 0.1
[M::ma_hit_cut] typeA 0 , typeB 8572488, used 0.0
[M::ma_hit_flt] typeA 0 , typeB 8572488, used 0.0s
[M::hamt_hit_contained_multi] treated roughly 0 spots, used 0.2s
[M::hamt_hit_contained_drop_singleton_multi] treated roughly 4 spots, used 0.1s
[debug::ma_hit_contained_advance] ret0: 32, used 0.02 s
[M::ma_hit_contained_advance] dropped 8572680 reads, used total of 0.15 s

[M::asg_arc_del_trans] reduced 0 arcs, used 0.1s
[M::asg_cut_tip] cut 65 tips, used 0.1 s
[M::hamt_clean_graph] ====== initial clean ======

**********0-th round drop: drop_ratio = 0.200000**********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.2 s

[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_false_node_meta] removed 0 single nodes, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_exact] removed 0 inexact overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_length] removed 0 short overlaps, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_false_link] removed 0 false overlaps, used 1.7s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_complex_false_link] removed 0 false overlaps, used 1.5s
[M::asg_cut_tip] cut 0 tips, used 0.1 s

**********1-th round drop: drop_ratio = 0.400000**********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.2 s

[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_false_node_meta] removed 0 single nodes, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_exact] removed 0 inexact overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_length] removed 0 short overlaps, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_false_link] removed 0 false overlaps, used 1.5s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_complex_false_link] removed 0 false overlaps, used 1.5s
[M::asg_cut_tip] cut 0 tips, used 0.1 s

**********2-th round drop: drop_ratio = 0.600000**********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.2 s

[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_false_node_meta] removed 0 single nodes, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_exact] removed 0 inexact overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_length] removed 0 short overlaps, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_false_link] removed 0 false overlaps, used 1.5s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_complex_false_link] removed 0 false overlaps, used 1.5s
[M::asg_cut_tip] cut 0 tips, used 0.1 s

**********3-th round drop: drop_ratio = 0.800000**********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.2 s

[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_false_node_meta] removed 0 single nodes, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_exact] removed 0 inexact overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_length] removed 0 short overlaps, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_false_link] removed 0 false overlaps, used 1.5s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_complex_false_link] removed 0 false overlaps, used 1.5s
[M::asg_cut_tip] cut 0 tips, used 0.1 s

********** last round **********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.2s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.3 s

[M::asg_arc_del_short_diploi_by_suspect_edge] removed 0 suspect overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_del_triangular_directly] removed 0 triangular overlaps, used 0.1s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_too_short_overlaps] removed 0 short overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s

********** checkpoint: r_utg **********
Writing raw unitig GFA to disk... 
[M::hamt_clean_graph] ======= preclean =======
[M::hamt_ug_pop_simpleInvertBubble] popped 0 locations
[M::hamt_ug_oneutgCircleCut] treated 0 spots
[M::hamt_clean_graph] round 0, dropped 0, used 0.9s

********** checkpoint: p_utg **********

[M::hamt_output_unitig_graph_advance] Writing GFA... 
[M::hamt_ug_drop_shorter_ovlp] cut 0
[M::hamt_asgarc_ugCovCutDFSCircle_aggressive] cut 0.

[M::hamt_clean_graph] time check #1, 1.2s
[M::hamt_ug_oneutgCircleCut] treated 0 spots
[M::hamt_ug_basic_topoclean_simple] total cut: 0
/bin/bash: line 3:    61 Segmentation fault      (core dumped) hifiasm_meta -o $BD/$BD -t 64 D1181.fastq.gz

Thanks for your help.

xfengnefx commented 1 year ago

generating 5 ha_hist_line plots and 1 hist_readlength plot.

Do you see total of 5 lines containing "ha_hist_line"...? If so, the input might not be HiFi reads. The SRR13392911 seems to be sequel II data or subreads, hifiasm(-meta) cannot handle them. If not, could you paste the full log? Thanks.

nallsing-salk commented 1 year ago

Thank you very much for your response. I have discovered that some of the accessions, SRR13392911 included, do not have quality values in the fastq file (all values are "!"), so it is not being considered HiFi. I will try some workarounds or only use accessions with quality scores.