shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
54 stars 12 forks source link

flye "Looks like the system ran out of memory" #27

Closed mihinduk closed 3 years ago

mihinduk commented 3 years ago

I am trying to do the contig assembly for 523 samples and have run into this issue:

[2020-10-28 16:49:19] INFO: Simplifying the graph [2020-10-28 19:21:09] ERROR: Looks like the system ran out of memory [2020-10-28 19:21:09] ERROR: Command '['flye-modules', 'repeat', '--disjointigs', '/scratch/sahlab/RC2_IBD_virome/assembly/contig_dictionary/00-assembly/draft_assembly.fasta', '--reads', './assembly/contig_dictionary/all.mh.contigs_for_flye.fa', '--out-dir', '/scratch/sahlab/RC2_IBD_virome/assembly/contig_dictionary/20-repeat', '--config', '/opt/htcf/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/py-flye-2.7.1-36mvt7vew5klvjj37weoxusoqe4l33ka/lib/python3.6/site-packages/flye/config/bin_cfg/asm_subasm.cfg', '--log', '/scratch/sahlab/RC2_IBD_virome/assembly/contig_dictionary/flye.log', '--threads', '24', '--meta', '--min-ovlp', '1000', '--kmer', '31']' died with <Signals.SIGKILL: 9>. [2020-10-28 19:21:09] ERROR: Pipeline aborted

This is what the next step SHOULD have been: [2020-06-03 23:00:27] INFO: >>>STAGE: plasmids [2020-06-03 23:00:27] INFO: Recovering short unassembled sequences

Here is the command that was running when it crashed: flye --subassemblies $OUT/contig_dictionary/all.mh.contigs_for_flye.fa -t 24 --meta --plasmids -o $OUT/contig_dictionary -g 1g

Here are my memory and node requests:

SBATCH --cpus-per-task=16

SBATCH --mem=250G

This is my version of flye: module load py-flye/2.7.1-python-3.6.5

I have looked for this issue re flye. One recommedation was to update to flye2.5, but we are more up to date than that. https://github.com/fenderglass/Flye/issues/142 The second recommendation was "Hard to tell what is going on, because all cluster environments are usually configured very differently. I would suggest to try to resume run with an increased number of requested threads (maybe 25 in PBS, but use -t 20 in Flye) and specify maximum RAM (say ~500Gb should be sufficient)." https://github.com/fenderglass/Flye/issues/138

Would appreciate any suggestions of what to try next. I would try the flye command in an interactive session I can't go above 250G, but I could try increasing the cpus, -t 24 in flye

shandley commented 3 years ago

This isn't anything we can fix. This many samples (which will not be our or anyone else's normal sample size). You can try increasing -t to 24 or more (if allowed). However, I think this is just because your data set is so large. We will likely need to come up with a specific plan for this sample set that will not be applicable to the general hecatomb workflow.

mihinduk commented 3 years ago

I got a response from Mikhail: Looks strange, it is possible that your assembly graph became too complex to process for the simplification algorithm. I suggest to re-run without --meta option, as it will disable the simplification part that has crashed. In general, there is no need to use --meta with --subassemblies, because the assumptions about the input data for these modes are very different.

Mikhail

mihinduk commented 3 years ago

This ran in < 2 hours for 523 samples. May be a good change for the pipeline.