johnvusich / bulk-rnaseq

Pipeline to analyze bulk RNASeq data on HPCC using SLURM
0 stars 1 forks source link

Issue running bulk-rnaseq on MSU HPCC #1

Closed rcorfixs closed 3 months ago

rcorfixs commented 3 months ago

Here is the text for a GitHub issue based on the contents of the provided README file:


Title: Issue with Running Bulk RNA-seq Pipeline on MSU HPCC

Description: I encountered an issue while running the Bulk RNA-seq pipeline as described in the README. The pipeline is based on the nf-core/rnaseq pipeline and is intended to be executed on the MSU HPCC using SLURM as the job executor.

Steps to Reproduce:

  1. Created a directory for RNA-seq analysis in my home directory ($HOME/rnaseq).
  2. Created a samplesheet.csv file for pre-processing.
  3. Created a nextflow.config file.
  4. Downloaded the reference genome and GTF files into the directory.
  5. Wrote a bash script (run_rnaseq.sb) to run the pre-processing pipeline using SLURM.
  6. Ran the pre-processing pipeline using the command: sbatch run_rnaseq.sb.

Observed Behavior: The pipeline failed at the step where STAR maps FASTQ reads to the reference genome. The error message indicated insufficient memory allocation.

Expected Behavior: The pipeline should have successfully mapped the FASTQ reads to the reference genome using STAR.

Configuration Details:

Environment:

Additional Context: I have followed all steps mentioned in the README, including creating the necessary files and directories, and loading the required modules. Despite allocating 40GB of RAM, the pipeline still fails at the STAR mapping step. I suspect there might be a configuration issue or a potential bug in the pipeline script.

Request: I would appreciate any guidance on resolving this memory allocation issue or any suggestions for improving the pipeline's performance on the MSU HPCC.

johnvusich commented 3 months ago

Hi rcorfixs,

Thank you for reporting this issue and providing detailed information about your setup and the steps you followed. It seems like you've done a great job in setting up the pipeline and providing all necessary details.

Steps to Diagnose and Resolve the Issue:

  1. Check Memory Allocation: Although you allocated 40GB of RAM, STAR typically requires a significant amount of memory, especially for large genomes like GRCh38. To ensure sufficient memory allocation, you might need to increase the memory allocation. Try setting --mem to a higher value, such as 60GB or even 80GB, depending on the resources available on your HPC.

  2. STAR Index Files: Ensure that the STAR index files are correctly generated and located in the specified directory. If the index files are not properly created or corrupted, STAR may fail to map the reads.

  3. SLURM Configuration: Verify that your SLURM configuration on the MSU HPCC is correct and that there are no restrictions or limits on memory usage that could be causing the job to fail. You can check the SLURM documentation or consult with your HPC support team.

  4. Check Log Files: Examine the log files generated by STAR for any specific error messages or warnings. These logs can provide more insight into why the job is failing. You can find the log files in the directory where the pipeline is running.

  5. Nextflow Configurations: Ensure that the nextflow.config file is correctly configured with the appropriate paths to the genome and GTF files. Double-check the paths and make sure there are no typos or incorrect references.

  6. Update Pipeline and Dependencies: Make sure that you are using the latest version of the nf-core/rnaseq pipeline and all its dependencies. Sometimes, issues are resolved in newer versions of the pipeline or its components.

Example Adjusted run_rnaseq.sb Script:

#!/bin/bash
#SBATCH --job-name=rnaseq_preprocessing
#SBATCH --time=24:00:00
#SBATCH --mem=80GB  # Increased memory allocation
#SBATCH --cpus-per-task=8

cd $HOME/rnaseq
module load Nextflow/23.10.0
nextflow run nf-core/rnaseq -r 1.5.0 --input ./samplesheet.csv --genome GRCh38 -profile singularity -work-dir $SCRATCH/rnaseq_work -c ./nextflow.config

Additional Suggestions:

Please try these suggestions and let us know if the issue persists. We're here to help!

Best regards, John