nf-core / differentialabundance

Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
https://nf-co.re/differentialabundance
MIT License
64 stars 37 forks source link

Error with `ch_gene_sets` when running gprofiler analysis without gmt file #360

Open Mathias-Boulanger opened 4 days ago

Mathias-Boulanger commented 4 days ago

Description of the bug

Hi,

I encountered an issue while running the pipeline with data generated using the nf-core/RNAseq pipeline. Specifically, when attempting to run the gprofiler analysis without providing a gmt file, I received the following error at line 489 in differentialabundance.nf:

ERROR ~ No such variable: ch_gene_sets 

However, when I supplied a gmt file (e.g., gene_sets_files: ref/2024-11-18_Danio_rerio_GO_annotation_quickGO.gmt), the pipeline executed smoothly.

Root Cause: The issue seems to originate from lines 64-84 in the differentialabundance.nf file. The channel ch_gene_sets is not initialized when params.gprofiler2_run is set to true. This results in the observed error during execution.

Proposed Fix: To address this, I suggest modifying the relevant section of the code as follows:

// Proposed update for lines 65-80 in differentialabundance.nf
if (run_gene_set_analysis) {
    ch_gene_sets = []    // For methods that can run without gene sets
    if (params.gene_sets_files) {
        gene_sets_files = params.gene_sets_files.split(",")
        ch_gene_sets = Channel.of(gene_sets_files).map { file(it, checkIfExists: true) }
        if (params.gprofiler2_run && (!params.gprofiler2_token && !params.gprofiler2_organism) && gene_sets_files.size() > 1) {
            error("gprofiler2 can currently only work with a single gene set file")
        }
    } else if (params.gsea_run) {
        error("GSEA activated but gene set file not specified!")
    } else if (params.gprofiler2_run) {
        if (!params.gprofiler2_token && !params.gprofiler2_organism) {
            error("To run gprofiler2, please provide a run token, GMT file, or organism!")
        }
    }
}

Outcome: While this resolves the initial issue, a new error arises at line 489, where ch_gene_sets.first() fails because ch_gene_sets is an empty list.

Could this issue be addressed in the next release? Let me know if additional information or testing is needed to resolve this!

Thank you for your work on this project.

Command used and terminal output

nextflow run nf-core/differentialabundance -profile singularity -params-file parameters.yaml -c custom.config                                                                                               

 N E X T F L O W   ~  version 24.10.0                                                                                   

Launching `https://github.com/nf-core/differentialabundance` [goofy_lorenz] DSL2 - revision: 3dd360fed0 [master]  
WARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`                                                                                                               

------------------------------------------------------                                                                  
                                        ,--./,-.                                                                        
        ___     __   __   __   ___     /,-._.--~'                                                                       
  |\ | |__  __ /  ` /  \ |__) |__         }  {                                                                          
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,                                                                       
                                        `._,._,'                                                                        
  nf-core/differentialabundance v1.5.0-g3dd360f                                                                         
------------------------------------------------------                                                                  
Core Nextflow options                                                                                                   
  revision                    : master                                                                                  
  runName                     : goofy_lorenz                                                                            
  containerEngine             : singularity                                                                             
  container                   : [RMARKDOWNNOTEBOOK:biocontainers/r-shinyngs:1.8.8--r43hdfd78af_0]                       
  launchDir                   : /scratch/boulanger/nfWD/2024_24s000570_fasano_RNAseq/nfcore_DA_GRCz11                   
  workDir                     : /scratch/boulanger/nfWD/2024_24s000570_fasano_RNAseq/nfcore_DA_GRCz11/work              
  projectDir                  : /home/boulange/.nextflow/assets/nf-core/differentialabundance                           
  userName                    : boulange                                                                                
  profile                     : singularity                                                                             
  configFiles                 :                                                                                         

Input/output options                                                                                                    
  study_name                  : 24s000570_Fasano_DE_analysis_GRCz11                                                     
  input                       : inputs/input.csv                                                                        
  contrasts                   : inputs/contrast.csv                                                                     
  outdir                      : 20241118_DEA_GRCz11

Abundance values                                                                                                        
  matrix                      : inputs/salmon.merged.gene_counts.tsv                                                    
  transcript_length_matrix    : inputs/salmon.merged.gene_lengths.tsv                                                   
  affy_cel_files_archive      : null                                                                                    
  querygse                    : null                                                                                    

Affy input options                                                                                                      
  affy_cdfname                : null                                                                                    

Exploratory analysis                                                                                                    
  exploratory_log2_assays     : raw,normalised                                                                          

Limma specific options (microarray only)                                                                                
  limma_spacing               : null                                                                                    
  limma_block                 : null                                                                                    
  limma_correlation           : null                                                                                    

gprofiler2                                                                                                              
  gprofiler2_run              : true                                                                                    
  gprofiler2_organism         : drerio                                                                                  
  gprofiler2_correction_method: gSCS                                                                                    
  gprofiler2_background_file  : auto                                                                                    

Shiny app settings                                                                                                      
  shinyngs_shinyapps_account  : null                                                                                    
  shinyngs_shinyapps_app_name : 24s000570_Fasano_DE_analysis_GRCz11                                                     

Options related to gene set analysis
  gene_sets_files             : null

Reporting options
  email                       : mathias.boulanger@embl.de
  logo_file                   : GC_resources/logo/genecorelogo-01.png
  report_title                : 18.11.2024 24s000570 Fasano Differential Expression Analysis GRCz11 
  report_author               : Mathias Boulanger                                                                       
  report_description          : null                                                                                    

Reference genome options                                                                                                
  genome                      : GRCz11                                                                                  
  gtf                         : ref/Danio_rerio.GRCz11.113.gtf.gz                                                       
  igenomes_ignore             : true                                                                                    

Generic options                                                                                                         
  email_on_fail               : mathias.boulanger@embl.de                                                               

!! Only displaying parameters that differ from the pipeline defaults !!                                                 
------------------------------------------------------                                                                  
If you use nf-core/differentialabundance for your analysis please cite:                                                 

* The pipeline                                                                                                          
  https://doi.org/10.5281/zenodo.7568000                                                                                

* The nf-core framework                                                                                                 
  https://doi.org/10.1038/s41587-020-0439-x                                                                             

* Software dependencies                                                                                                 
  https://github.com/nf-core/differentialabundance/blob/master/CITATIONS.md                                             
------------------------------------------------------                                                                  
[-        ] NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GUNZIP_GTF       -                                       
[-        ] NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GTF_TO_TABLE     -                                       
[-        ] NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:VALIDATOR        -                                       
[-        ] NFC_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_MATRIXFILTER -                                       
ERROR ~ No such variable: ch_gene_sets                                                                                  

 -- Check script '/home/boulange/.nextflow/assets/nf-core/differentialabundance/./workflows/qdifferentialabundance.nf' at line: 489 or see '.nextflow.log' file for more details

Relevant files

parameters.yaml.zip

System information

Nextflow 23.10.0 HPC Singularity CentOS nf-core/differentialabundance 1.5.0

WackerO commented 1 day ago

Hello @Mathias-Boulanger, this is fixed on the dev branch, can you rerun the pipeline with the addition of the flag -r dev and see if it works?