hariszaf / pema

PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes
26 stars 12 forks source link

HPC job specifications #49

Closed marymcelroy closed 1 year ago

marymcelroy commented 1 year ago

Hi Haris! I'd like to use PEMA on some metabarcoding data for my graduate work. I successfully installed the Singularity image on my university's HPC environment, but I was hoping for some advice about how to estimate the HPC resources I would need to run PEMA in my job script (we use Slurm). Specifically, do you have any guidance for the #SBATCH specifications and values I should use?

For context, I have 96 samples that were PE sequenced for COI, 18S, and 16S amplicons (euk eDNA metabarcoding) on an Illumina MiSeq, so I have 576 fastq files as my raw sequencing data. I would like to use a custom ref db for COI, so I will follow your instructions about training the RDP classifier (I know this will likely affect computational load). Thank you!

cpavloud commented 1 year ago

Hi @marymcelroy

Your question depends on the available partitions/resources in your HPC. The more cores you assign, the faster your job will be completed.

For example, in the Zorbas HPC, I would normally use the batch partition (1 node and 20 cores). I would say that, if I were to run e.g. a job for 96 samples and 16S rRNA using the #SBATCH specifications and the parameters I normally use, it would take more or less a day. (You will need to run 3 separate jobs, since you have 3 genes and thus, you will have 3 different parameters files).

Μy shell script would be something like

!/bin/bash -l

SBATCH --partition=batch

SBATCH --nodes=1

SBATCH --ntasks-per-node=20

SBATCH --mem=40G

SBATCH --job-name="my_pema_job"

SBATCH --output=my_pema_job.output

SBATCH --requeue

module purge # unloads all previous loads

module load singularity/3.7.1 #loads singularity

singularity run -B /home1/christina/the_directory_where_the_mydata_folder_and_the_parameters_are/:/mnt/analysis /home1/christina/pema_v.2.1.4.sif

module unload singularity/3.7.1 #unloads singularity

hariszaf commented 1 year ago

Hi @marymcelroy.

@cpavloud is right. Just a few more comments from my side:

In any case, if I had to guess, I would say that you wouldn't need more than 3 days time for any of your analysis no matter what parameters you ask for. A step that can really take some time if you have a great number of otus/asvs, is the getNCBITaxId; I suggest to have this as No at least until you have your first results.

Good luck! :100:

marymcelroy commented 1 year ago

Thank you both very much for the suggestions!

hariszaf commented 1 year ago

@marymcelroy I now close this issue and in case you d like to give us any feedback, please feel free to open a "new discussion".

Thanks again for your interest on PEMA.