xfengnefx / hifiasm-meta

hifiasm_meta - de novo metagenome assembler, based on hifiasm, a haplotype-resolved de novo assembler for PacBio Hifi reads.
MIT License
61 stars 8 forks source link

Extract “Tangled circles” #35

Open Flooooooooooooower opened 3 months ago

Flooooooooooooower commented 3 months ago
Then, the circular contigs were left alone, and each tangled “circle” was re-assembled by Hifiasm-meta with default parameters, using the fragmental contigs that constructed the “circle” as input reads, and resulted seven extra circular contigs.

HiI am using hifiasm-meta for metagenome assembly and have divided the assembled results into three types. However, I am encountering some issues, as mentioned above, because I am unfamiliar with the software and don’t know how to extract and reassemble the “tangled circles.” I would greatly appreciate any help you can provide. Best wishes

Flooooooooooooower commented 2 months ago

Hi,I have been resolved the problem

xfengnefx commented 2 months ago

Sorry for the late rely. Glad it's resolved. Please feel free to drop a new issue if you have any questions :)

Flooooooooooooower commented 2 months ago

Hi, I'm sorry to bother you again. I encountered the following error while running hifiasm, and I'm not sure what it means. It's also worth noting that the memory usage can reach up to 600GB during the run.

Writing reads to disk...
wrote cmd of length 387: hamt version=0.3-r073, ha base version=0.13-r308, CMD= /home/jiaoh/00.Software/hifiasm-meta/hifiasm_meta -o /home/jiaoh/10.meta-genome/04.Enviroment/01.PRJNA879921/02.Assembly/plant_gas -t 50 /home/jiaoh/10.meta-genome/04.Enviroment/01.PRJNA879921/01.host_remove/plant_gas.clean.fq.gz
Bin file was created on Mon Sep  9 09:59:05 2024
Hifiasm_meta 0.3-r073 (hifiasm code base 0.13-r308).
Reads has been written.
[hamt::write_All_reads] Writing per-read coverage info...
[hamt::write_All_reads] Finished writing.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
bin files have been written.
[M::hamt_clean_graph] (peak RSS so far: 620.3 GB)
[M::hamt_clean_graph] no debug gfa
[M::hamt_clean_graph] (peak RSS so far: 620.3 GB)
MGA.sh: line 27: 30568 Killed                  ~/00.Software/hifiasm-meta/hifiasm_meta -o ${path}/02.Assembly/${i} -t ${threads} ${fq_clean}
xfengnefx commented 2 months ago

Looks like oom kill, unless job hit some time limit. Could you try resuming the run by ~/00.Software/hifiasm-meta/hifiasm_meta -o ${path}/02.Assembly/${i} -t ${threads} ${fq_clean} in the same directory, with all variables the same as before? The stage after all-v-all read ovec should use less memory and hopefully could finish.

If you have other samples to run, or runs killed before the log could say "bin files have been written", please try the meta_dev branch (commit f98f1ad, r74) instead, which tries to fix the high peak RSS issue and otherwise identical to r73. I will merge r74 into master and update bioconda this week.

Flooooooooooooower commented 2 months ago

Hi, has the latest r74 version been released? Does this version address the high peak RSS issue and help reduce memory usage? Thank you!

xfengnefx commented 2 months ago

It is now merged into master meta branch (the default branch here) and is in the current release. I also opened PR at bioconda, which is waiting for review and hopeful will be merged soon now merged. Thanks for your patience.

Flooooooooooooower commented 2 months ago

Hi, I saw that you released an update, and I immediately installed it using conda. However, unfortunately, I encountered the following error. Could you please help me understand what might be the cause

[M::main] Start: Mon Sep 23 14:42:49 2024

[M::hamt_assemble] Skipped read selection.
/opt/gridview/slurm/spool_slurmd/job9804618/slurm_script: line 53: 112549 Segmentation fault      hifiasm_meta -o ${output}/02.Assembly/${i} -t ${threads} ${fq_clean}
xfengnefx commented 2 months ago

Are you using bin files from the oom killed run above? From the log I guess no? If it is indeed a new run: could you try simple rerun the failed job with everything unchanged? I remember from a very long time ago, I saw a segfault, around this stage into the run, that disappeared upon rerun and I failed to reproduce it afterwards, therefore whatever that has been unfixed...

If rerun does not resolve this segfault, I might need to roll HEAD back to r73.

I wanted to say "share data if it can be shared and I will troubleshoot" as usual, but I do not have access to HPC clusters right now. Sorry.

Flooooooooooooower commented 2 months ago

Hello, I am still encountering the above error when rerunning locally on the HPC. The data is from all Colorectum data in PRJNA748109.

hifiasm_meta -o 02.Assembly/PRJNA748109_Colorectum  -t 50 ../05.PRJNA748109_Colorectum/01.host_remove/PRJNA748109_Colorectum.clean.fq.gz
[M::main] Start: Tue Sep 24 11:20:12 2024

[M::hamt_assemble] Skipped read selection.
Segmentation fault
Flooooooooooooower commented 2 months ago

Hello. I'm sorry to bother you again. What should I do when encountering the following log information? However, I still obtained the assembled results.

********** checkpoint: post-assembly **********

[M::hamt_clean_graph] (peak RSS so far: 126.1 GB)
[M::hamt_ug_opportunistic_elementary_circuits] collected 0 circuits, used 0.02s
[M::hamt_ug_opportunistic_elementary_circuits] wrote all rescued circles, used 0.00s
[T::hamt_ug_opportunistic_elementary_circuits_helper_deduplicate_minhash] got the sequences, used 0.0s
[T::hamt_minhash_mashdist] sketched - 0.0s.
[T::hamt_minhash_mashdist] compared - 0.0s.
[T::hamt_ug_opportunistic_elementary_circuits_helper_deduplicate_minhash] collected mash distances for 0 seqs, used 0.0s
[M::hamt_ug_opportunistic_elementary_circuits_helper_deduplicate_minhash] had 0 paths, 0 remained (0 dropped by length diff, 0 by length abs),used 0.0s after sketching.
[M::hamt_ug_opportunistic_elementary_circuits] deduplicated rescued circles, used 0.01s
[M::hamt_ug_opportunistic_elementary_circuits] wrote deduplicated rescued circles, used 0.00s
[M::hamt_simple_binning] Will try to bin on 101 contigs (skipped 0 because blacklist).
Using random seed: 42
Perplexity too large for the number of data points!
xfengnefx commented 2 months ago

Perplexity too large for the number of data points!

Sorry about the vague error. This run actually finished, both the assembly and the circle finding results should have been produced. The error was on the built-in MAG binning, which failed to find any bins: either because the sample was simple & there is nothing to bin with, or the assembly was fragmented & there is nothing to bin with. I should've include catching of this signal in the latest patch, but forgot to...

Is this the PRJNA748109 that segfaulted, or a different sample?

Flooooooooooooower commented 2 months ago

Thank you for clarifying my confusion. The segmentation fault occurred during the run of Colorectum in PRJNA748109.

xfengnefx commented 2 months ago

So did PRJNA748109 somehow managed to have a run without the segfault, and got to the "checkpoint: post-assembly" part as the log posted above? Or this sample always triggered the segfault, while other samples of yours were assembled?

Thanks for letting me know though, I will remember to test on PRJNA748109 when I have access to servers. Sorry for no actual fix at the moment.

Flooooooooooooower commented 1 month ago

Hello, I'm not sure what the reason is, but when I switch to another server, it runs successfully. However, on some servers, including clusters, it encounters a segmentation fault (SG). They were all installed via conda.

xfengnefx commented 1 month ago

I see, thanks so much for the report. I will remember this when testing.