nf-core / rnasplice

rnasplice is a bioinformatics pipeline for RNA-seq alternative splicing analysis
https://nf-co.re/rnasplice
MIT License
44 stars 24 forks source link

Error in SUPPA: Clustergroups are assigned incorrectly #131

Closed spraeger closed 5 months ago

spraeger commented 6 months ago

Description of the bug

Hi, For one of my contrasts (pdx_41-control_don) the group parameter for CLUSTEREVENTS_IOI (suppa_clusterevents.nf) is assigned incorrectly, so that ERROR:lib.cluster_tools:Invalid index. Index 6 is smaller than the number of columns in the file (7). occurs. For all other contrasts, the clustergroup assignment works fine.

The related pdx_41-control_don_transcript_diffsplice.psivec file contains seven columns, the correct grouping would be --groups 1-4,5-7:

transcript_pdx_41_1     transcript_pdx_41_2     transcript_pdx_41_3     transcript_pdx_41_4     transcript_control_don_1        transcript_control_don_2        transcript_control_don_3
ENSG00000290825.1;ENST00000456328.2     1.0     1.0     1.0     1.0     nan     nan     nan

It seems that the derivation of the clustergroups for this contrast was never started. The related work directory 0d/4f97644516640eda4e35d88e4dab59 is empty.

(base) -bash-4.2$ grep pdx_41-control_don .nextflow.log | grep CLUSTERGROUPS
~> TaskHandler[jobId: null; id: 164; name: NFCORE_RNASPLICE:RNASPLICE:SUPPA_SALMON:CLUSTERGROUPS_IOI (pdx_41-control_don); status: NEW; exit: -; error: -; workDir: XXX/rnasplice_pdx/work/0d/4f97644516640eda4e35d88e4dab59 started: -; exited: -; ]
activation/XXX/rnasplice_pdx/work/0d/4f97644516640eda4e35d88e4dab59 started: -; exited: -; ]

However, CLUSTEREVENTS_IOI is executed with --groups 1-3,4-6 which causes the error to occur. Could you please help me to understand why and at which point the assignment --groups 1-3,4-6 is made, as CLUSTERGROUPS does not seem to run? Thank you in advance!

Command used and terminal output

Command used:
nextflow run \
$(RNASPLICE_DIR) \
--input config/samplesheet_pdx_group.csv \
--contrasts config/contrastsheet_pdx_group.csv \
--outdir workspace/rnasplice_pdx_group_results \
-c config/XXX.config \
--fasta $(GENOMEDIR)/GRCh38.primary_assembly.genome.fa \
--gtf $(GENOMEDIR)/gencode.v43.annotation.gtf \
--star_index $(GENOMEDIR)/genome/index/star \
--salmon_index $(GENOMEDIR)/genome/index/salmon \
--gencode \
--save_reference \
--save_unaligned \
--min_samps_gene_expr 0 \
--min_samps_feature_expr 0 \
--min_samps_feature_prop 0 \
--min_feature_expr 0 \
--min_feature_prop 0 \
--min_gene_expr 0 \
--miso_genes "ENSG00000211899.10, ENSG00000171862.14, ENSG00000004961.15, ENSG00000005302.19, ENSG00000147403.18"

Output:
-[nf-core/rnasplice] Pipeline completed with errors-                                                      [38/1751]

ERROR ~ Error executing process > 'NFCORE_RNASPLICE:RNASPLICE:SUPPA_SALMON:CLUSTEREVENTS_IOI (pdx_41-control_don)'

Caused by:
  Process `NFCORE_RNASPLICE:RNASPLICE:SUPPA_SALMON:CLUSTEREVENTS_IOI (pdx_41-control_don)` terminated with an erro$
 exit status (1)

Command executed:

  suppa.py \
      clusterEvents \
      --dpsi pdx_41-control_don_transcript_diffsplice.dpsi \
      --psivec pdx_41-control_don_transcript_diffsplice.psivec \
      --dpsi-threshold 0.05 \
      --eps 0.05 \
      --metric euclidean \
      --min-pts 20 \
      --groups 1-3,4-6 \
      --clustering DBSCAN \
       -o pdx_41-control_don_transcript_cluster

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASPLICE:RNASPLICE:SUPPA_SALMON:CLUSTEREVENTS_IOI":
      suppa: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('suppa').version)")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is prefer
red
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  WARNING: Skipping mount /var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in con
tainer
  ERROR:lib.cluster_tools:Invalid index. Index 6 is smaller than the number of columns in the file (7).

Work dir:
 XXX/rnasplice_pdx/work/df/888c5d2dcf001bf157d5ddbaffafbb

Relevant files

contrastsheet_pdx_group.csv samplesheet_pdx_group.csv

System information

CentOS Linux release 7.9.2009 (Core), LSF Cluster Nextflow version 23.10.1 $(RNASPLICE_DIR) in the pipeline call refers to a fork of nf-core/rnasplice v1.0.2 that increases alignment resources (https://github.com/dkoppstein/rnasplice/tree/increase_sam)

jma1991 commented 6 months ago

Hey @spraeger

Thanks for reporting your issue. The error arises because the channel which contains the groups parameter is out of sync with the channels which feed the dpsi and psivec parameters. This occurs because Nextflow processes are not guaranteed to return results in the order they arrive from the input channel. This is easily overlooked, and I can only apologise. I've prototyped a solution and will try and get it posted tomorrow as a hot fix for you to test with your data.

James

jma1991 commented 6 months ago

Hello @spraeger,

I've submitted a pull request with the proposed fix. Could you please test it and let me know if it resolves the issue for you? I'll need to hold off on merging and releasing it until it undergoes a second code review. Thanks!

spraeger commented 5 months ago

Hi @jma1991,

Thanks a lot for clarification and the prompt action! I have tested your patch and it resolves the issue for my data.

jma1991 commented 5 months ago

Hey @spraeger

We have just released 1.0.4 which has this issued fixed. Thanks for bringing it to our attention.