marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 29 forks source link

Excessively long running time #62

Closed lslochov closed 10 months ago

lslochov commented 1 year ago

Hi, I'm running the example dataset featured in section 4.1 of the README on an HPC SLURM cluster. I'm seeing MitoHifi's running time exceed 24 hours without finishing, even when I request 16 CPU cores and 32 GB RAM. I doubt that should be happening for a dataset of this size, so I suspect I've misconfigured something, but I'm not sure what. I'd appreciate any help in understanding and resolving this.

marcelauliano commented 11 months ago

can you give all the details of your run please

lslochov commented 11 months ago

I started by downloading reference files for "Deilephila porcellus", per section 4.1 of the README.

python3 findMitoReference.py --species "Deilephila porcellus" --outfolder /home/llochovsky001202/data/deilephila --min_length 14000

Then I ran MitoHifi itself using Deilephila porcellus test data:

python3 mitohifi.py -r ../tests/ilDeiPorc1.reads.100.fa -f /home/llochovsky001202/data/deilephila/NC_079697.1.fasta -g /home/llochovsky001202/data/deilephila/NC_079697.1.gb -t 4 -o 5

I ran this on SLURM with the following config:

#!/bin/bash
#SBATCH --job-name=mitohifi_test
#
#SBATCH --nodes=1
#
#SBATCH --ntasks-per-node=1
#
#SBATCH --cpus-per-task=4
#
#SBATCH --mem=8G
#
#SBATCH --time=24:00:00

The job eventually reaches the 24-hour limit and is killed by SLURM. I tried increasing the CPUs and RAM in the hope that it would speed up the computation, but even with 16 CPU cores and 32 GB RAM, I see the same results.

marcelauliano commented 10 months ago

It should take only a few minutes. It's probably something with your system or installation.