hzi-bifo / RiboDetector

Accurate and rapid RiboRNA sequences Detector based on deep learning
GNU General Public License v3.0
94 stars 16 forks source link

ribodetector_cpu hangs with SLURM #31

Closed gianfilippo closed 1 year ago

gianfilippo commented 1 year ago

Hi,

I tried your package on an interactive SLURM session, and it worked.

I then tried to submit it as a job via SLURM and it hangs at

2023-03-09 16:13:36 : INFO Using high MCC model file: /home/conda_envs/ribodetector/lib/python3.9/site-packages/ribodetector/data/ribodetector_600k_variable_len70_101_epoch47.onnx on CPU

I already tried to reinstall and nothing changes.

The command I issued in both sessions is ribodetector_cpu -t 8 -l 92 -i $FASTQ1.fq.gz $FASTQ1.fq.gz -e rrna -o $outFASTQ1.nonrrna.1.fq $outFASTQ2.nonrrna.2.fq

What can I do ?

Thanks

dawnmy commented 1 year ago

Could you post your SLURM script or command used to submit the job? You need to specify --cpus-per-task to the number you CPU cores you need and set --threads-per-core to 1.

GeertvanGeest commented 1 year ago

I'm running into the same issue here. I submit it with sbatch, and it runs within a singularity container from here. At the start there are two active processes on the node, and after 5 mins, there's nothing going on anymore..

This is my script:

#!/usr/bin/env bash

#SBATCH --time=1-00:00:00
#SBATCH --mem-per-cpu=4G
#SBATCH --cpus-per-task=12
#SBATCH --threads-per-core=1

cd /workdir

MEAN_READ_LENGTH=`zcat results/fastp/MP_35_R1_trimmed.fastq.gz | head -1000 | awk '{if(NR%4==2) {count++; bases += length} } END {print int(bases/count)}' || true`

echo "Estimated read length: $MEAN_READ_LENGTH" 

singularity exec containers/ribodetector_0.2.7-cpu.sif \
ribodetector_cpu \
--len "$MEAN_READ_LENGTH" \
--threads "$SLURM_CPUS_PER_TASK" \
--input results/fastp/MP_35_R1_trimmed.fastq.gz results/fastp/MP_35_R2_trimmed.fastq.gz \
--output results/ribodetector/MP_35_R1.fastq.gz results/ribodetector/MP_35_R2.fastq.gz \
--rrna results/ribodetector/MP_35_R1_rrna.fastq.gz results/ribodetector/MP_35_R2_rrna.fastq.gz \
--ensure rrna
GeertvanGeest commented 1 year ago

It works now. The issue was not setting --chunk_size which led to memory issues.

RTFM.....

dawnmy commented 1 year ago

It works now. The issue was not setting --chunk_size which led to memory issues.

RTFM.....

It is great that you figured out the solution. This will be beneficial to other users. Will incorporate this into the FAQ in README.