Issue running bulk-rnaseq on MSU HPCC

johnvusich / bulk-rnaseq

Pipeline to analyze bulk RNASeq data on HPCC using SLURM

0 stars 1 forks source link

Here is the text for a GitHub issue based on the contents of the provided README file:

Title: Issue with Running Bulk RNA-seq Pipeline on MSU HPCC

Description: I encountered an issue while running the Bulk RNA-seq pipeline as described in the README. The pipeline is based on the nf-core/rnaseq pipeline and is intended to be executed on the MSU HPCC using SLURM as the job executor.

Steps to Reproduce:

Created a directory for RNA-seq analysis in my home directory ($HOME/rnaseq).
Created a samplesheet.csv file for pre-processing.
Created a nextflow.config file.
Downloaded the reference genome and GTF files into the directory.
Wrote a bash script (run_rnaseq.sb) to run the pre-processing pipeline using SLURM.
Ran the pre-processing pipeline using the command: sbatch run_rnaseq.sb.

Observed Behavior: The pipeline failed at the step where STAR maps FASTQ reads to the reference genome. The error message indicated insufficient memory allocation.

Expected Behavior: The pipeline should have successfully mapped the FASTQ reads to the reference genome using STAR.

Configuration Details:

samplesheet.csv:

sample,fastq_1,fastq_2
Sample1,Sample1_R1.fastq.gz,Sample1_R2.fastq.gz
Sample2,Sample2_R1.fastq.gz,Sample2_R2.fastq.gz

nextflow.config:

params {
  genome = '/path/to/genome.fa'
  gtf = '/path/to/genome.gtf'
  outdir = '/path/to/output_directory'
}

run_rnaseq.sb:

#!/bin/bash
#SBATCH --job-name=rnaseq_preprocessing
#SBATCH --time=24:00:00
#SBATCH --mem=40GB
#SBATCH --cpus-per-task=8

cd $HOME/rnaseq
module load Nextflow/23.10.0
nextflow run nf-core/rnaseq -r 1.5.0 --input ./samplesheet.csv --genome GRCh38 -profile singularity -work-dir $SCRATCH/rnaseq_work -c ./nextflow.config

Environment:

MSU HPCC
SLURM job scheduler
Nextflow version 23.10.0
nf-core/rnaseq version 1.5.0

Additional Context: I have followed all steps mentioned in the README, including creating the necessary files and directories, and loading the required modules. Despite allocating 40GB of RAM, the pipeline still fails at the STAR mapping step. I suspect there might be a configuration issue or a potential bug in the pipeline script.

Request: I would appreciate any guidance on resolving this memory allocation issue or any suggestions for improving the pipeline's performance on the MSU HPCC.

#!/bin/bash #SBATCH --job-name=rnaseq_preprocessing #SBATCH --time=24:00:00 #SBATCH --mem=80GB # Increased memory allocation #SBATCH --cpus-per-task=8 cd $HOME/rnaseq module load Nextflow/23.10.0 nextflow run nf-core/rnaseq -r 1.5.0 --input ./samplesheet.csv --genome GRCh38 -profile singularity -work-dir $SCRATCH/rnaseq_work -c ./nextflow.config

johnvusich / bulk-rnaseq

Issue running bulk-rnaseq on MSU HPCC #1

Steps to Diagnose and Resolve the Issue:

Example Adjusted `run_rnaseq.sb` Script:

Additional Suggestions:

johnvusich / bulk-rnaseq

Issue running bulk-rnaseq on MSU HPCC #1

Steps to Diagnose and Resolve the Issue:

Example Adjusted run_rnaseq.sb Script:

Additional Suggestions:

Example Adjusted `run_rnaseq.sb` Script: