nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
138 stars 7 forks source link

Find-motifs error: memory allocation of 61203283984 bytes failed #225

Open hassebossenbroek opened 3 months ago

hassebossenbroek commented 3 months ago

Hi,

I've been trying to run find-motifs for a few days, but I keep running into errors. I've updated modkit to the newest version (0.3.1), but that doesn't seem to resolve the issue. This specific sample keeps dumping the core it's running on, but I can't work out why. I'm trying to run this on an HPC, but it doesn't seem to be the HPC's issue as the job doesn't use all of its allocated memory/walltime.

Error file:

memory allocation of 61203283984 bytes failed
/lmod/apps/modkit/0.3.1/bin/modkit: line 2: 1911422 Aborted                 (core dumped) apptainer exec /lmod/apps/modkit/0.3.1/modkit_v0.3.1.sif /opt/dist/modkit "$@"

Log file (containing command):

[src/logging.rs::60][2024-07-05 10:50:59][DEBUG] command line: /opt/dist/modkit find-motifs --threads 48 --known-motif CCNG 0 m --known-motif CCNG 1 m --known-motif CCNG 0 h --known-motif CCNG 1 h --log-filepath find_motifs_out_2/find_motifs_log_21_SRSF2.txt --in-bedmethyl modkit_out/21_SRSF2mut.bed --ref /lmod/dbdata/homo_sapiens/GRCh38_ensembl_110/Homo_sapiens.GRCh38.dna.primary_assembly.fa --min-sites 30 --min-frac-mod 0.5
[src/find_motifs/subcommand.rs::109][2024-07-05 10:50:59][INFO] loading references from "/lmod/dbdata/homo_sapiens/GRCh38_ensembl_110/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
[src/find_motifs/subcommand.rs::140][2024-07-05 10:51:03][INFO] loaded 194 sequence(s)

Any idea what could be causing this would be much appreciated. Thank you!

ArtRand commented 3 months ago

Hello @hassebossenbroek,

Is it possible that there are some memory guards in your HPC environment that aren't visible to modkit? Can you increase the memory limit in apptainer? 62Gb of memory isn't huge by HPC standards but could be higher than a default value. How are you deploying modkit? Looks like you're running in a container, where did that distribution come from?

ArtRand commented 2 months ago

@hassebossenbroek any luck?

hassebossenbroek commented 2 months ago

@ArtRand Resubmitting to a high-memory node did the job eventually. I'm just struggling to find out how much walltime I should allocate - my jobs run out of time with 48h on a high-mem node. I've not even sequenced very deeply (15x or less). Is there any guidance available anywhere as to how long this is expected to take?

ArtRand commented 2 months ago

Hello @hassebossenbroek,

If you have very non-specific methylation the "exhaustive search" step can take quite a long time. The first thing I would try is --skip-search, there are other tips in the documentation.