mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
743 stars 164 forks source link

Correcting bubbles step crash #702

Open mprous1 opened 1 month ago

mprous1 commented 1 month ago

For me, Flye 2.9.3 or 2.9.4 usually fails at 'Correcting bubbles' step in Linux Mint, regardless if I compiled it or installed through Miniconda (installation in default miniconda environment does not work apparently because of Python 3.11).

flye --nano-corr nuclear.fq -t 16 -o flye_out

[2024-05-19 22:31:22] root: INFO: Correcting bubbles [2024-05-19 22:32:34] root: ERROR: Command '['flye-modules', 'polisher', '--bubbles', '/home/mprous/nanopore/flye_out/40-polishing/bubbles_1.fasta', '--subs-mat', '/home/mprous/miniconda3/envs/medaka/lib/python3.10/site-packages/flye/config/bin_cfg/nano_r94_substitutions.mat', '--hopo-mat', '/home/mprous/miniconda3/envs/medaka/lib/python3.10/site-packages/flye/config/bin_cfg/nano_r94_g36_homopolymers.mat', '--out', '/home/mprous/nanopore/flye_out/40-polishing/consensus_1.fasta', '--threads', '16']' died with <Signals.SIGSEGV: 11>. [2024-05-19 22:32:34] root: ERROR: Pipeline aborted

It does not happen always, seems that smaller datasets work usually, not sure where is the threshold, maybe around 20-30 GB of fastq data. Genomes are around 200-400 Mb with 20-60 X coverage and usually haploid.

No idea what could be the problem, the computer should have enough resources, 32 cores, 128 GB RAM (peak RAM usege usually less than half of that).

mikolmogorov commented 3 weeks ago

Sorry for my late response! Could it be relevant to https://github.com/fenderglass/Flye/issues/584? Is it using error-corrected reads? Does the error ever happen with reads without error correction?

If error does happen - is it reproducible? I.e. if it crashes, and you restart using --resume option, does it crash again?

I haven't observed similar errors on my end, so I'd need an example that consistently reproduces the error to work on that.

mprous1 commented 3 weeks ago

Not sure if it is related to #584. It happens often in my computer, but never happend (for the same dataset) with Flye 2.9-b1768 in a server, also using --nano-corr. The reads have not been corrected in any way. R10.4.1 nanopore reads basecalled with Dorado 0.5 and now 0.7 (1-10% duplex reads). I'm using --nano-corr which perhaps produces more contiguous assemblies (?).

--resume did not help, it died at same step I think... In one instance I remember (not sure if it died in a different step) with --resume it strangely produced the genome 2X the size without decrease in coverage.

I need to check if with --nano-hq it would crash at the same step.

mprous1 commented 2 weeks ago

I did more tests with earlier flye versions using the same dataset (2.9 and 2.9.2 installed from bioconda), but got the same result, crashing at bubble correction step. No difference when using --nano-corr or --nano-hq and no help with --resume (starts running minimap2 and then crashes at same step). The same error message every time.

But the same dataset works in a server with Flye 2.9 in its own environment without problems and I would assume the newer versions as well. Could there be some software conflicts?

I can nevertheless send a dataset (14 GB) for testing, but not sure if there is much point if the issue is more likely software environment.

mikolmogorov commented 6 days ago

I think it may be specific to your hardware / environment then. Could you give more info about both your personal machine and the server? E.g. processor type, OS.. Also the output of cat /proc/cpuinfo from both system will be helpful.

On our personal machine, if you build from source instead of using bioconda, does it still crash?

mprous1 commented 6 days ago

My computer where Flye crashes: Operating system: Kernel: 5.15.0-107-generic x86_64 bits: 64 compiler: gcc v: 11.4.0 Desktop: Cinnamon 6.0.4 tk: GTK 3.24.33 wm: muffin vt: 7 dm: LightDM 1.30.0 Distro: Linux Mint 21.3 Virginia base: Ubuntu 22.04 jammy

Hardware: 24-core (8-mt/16-st) model: Intel Core i9-14900KF bits: 64 type, 128 Gb RAM, NVIDIA GeForce RTX 4090/PCIe/SSE2 v: 4.6.0 NVIDIA 535.171.04

I'm away for the next three weeks, so can't access the computer and check the output of "cat /proc/cpuinfo"

Yes, building flye from source on my computer produces the same error.

The computing cluster where it works: https://hpc.ut.ee/services/HPC-services/Rocket

The cpuinfo from the rocket cluster is attached, although it might not be for the same node when running flye. cpu_UTHPC.txt