nf-core / differentialabundance

Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
https://nf-co.re/differentialabundance
MIT License
64 stars 37 forks source link

Errors thrown out in NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:DESEQ2_DIFFERENTIAL #215

Closed CrazyHsu closed 11 months ago

CrazyHsu commented 11 months ago

Description of the bug

Hello, My experimental design expects to find DEGs between different treatments within two tissues by using nf-core/differentialabundance pipeline. But I get some errors with command: nextflow run nf-core/differentialabundance -r 1.4.0 --input samplesheet.csv --contrasts sample_contrast_file.csv --matrix star_salmon/salmon.merged.gene_counts.tsv --transcript_length_matrix star_salmon/salmon.merged.transcript_lengths.tsv --gtf Zea_mays.gtf --outdir deg_analysis -profile rnaseq,docker. How can I fix it out? Thanks!

A snapshot the error follows as below, and a full .nextflow.log is attached in Relevant files section:

Dec-06 02:23:58.547 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Dec-06 02:23:58.547 [Task submitter] INFO  nextflow.Session - [f1/103041] Submitted process > NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:DESEQ2_DIFFERENTIAL ([id:MP_ck_pld3_knl2, variable:treatment, reference:ck_MP, target:mt_MP_pld3_knl2, blocking:])
Dec-06 02:23:58.549 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:DESEQ2_DIFFERENTIAL ([id:MP_ck_pld3_mtl_knl2, variable:treatment, reference:ck_MP, target:mt_MP_pld3_mtl_knl2, blocking:]); work-dir=/data2/work/4b/71c2b327e7e3899e712551b109b818
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:DESEQ2_DIFFERENTIAL ([id:MP_ck_pld3_mtl_knl2, variable:treatment, reference:ck_MP, target:mt_MP_pld3_mtl_knl2, blocking:])` terminated with an error exit status (1)
Dec-06 02:23:58.574 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:DESEQ2_DIFFERENTIAL ([id:MP_ck_pld3_mtl_knl2, variable:treatment, reference:ck_MP, target:mt_MP_pld3_mtl_knl2, blocking:])'

Caused by:
  Process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:DESEQ2_DIFFERENTIAL ([id:MP_ck_pld3_mtl_knl2, variable:treatment, reference:ck_MP, target:mt_MP_pld3_mtl_knl2, blocking:])` terminated with an error exit status (1)

Command executed [/home/crazyhsu/.nextflow/assets/nf-core/differentialabundance/./workflows/../modules/nf-core/deseq2/differential/templates/deseq_de.R]:

......

  converting counts to integer mode
  Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
    duplicate 'row.names' are not allowed
  Calls: read_delim_flexible -> read.delim -> read.table
  Execution halted

Work dir:
  /data2/work/4b/71c2b327e7e3899e712551b109b818

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Dec-06 02:23:58.589 [Task monitor] INFO  nextflow.Session - Execution cancelled -- Finishing pending tasks before exit
Dec-06 02:23:58.610 [main] DEBUG nextflow.Session - Session await > all processes finished
Dec-06 02:23:58.642 [Actor Thread 30] DEBUG nextflow.sort.BigSort - Sort completed -- entries: 2; slices: 1; internal sort time: 0.024 s; external sort time: 0.002 s; total time: 0.026 s

......
My sample file follows as below: sample fastq_1 fastq_2 treatment
ck_anther_1 ck-1mm-anther-1_R1.fq.gz ck-1mm-anther-1_R2.fq.gz ck_anther
ck_anther_2 ck-1mm-anther-2_R1.fq.gz ck-1mm-anther-2_R2.fq.gz ck_anther
ck_MP_1 ck-mature-pollen-1_R1.fq.gz ck-mature-pollen-1_R2.fq.gz ck_MP
ck_MP_2 ck-mature-pollen-2_R1.fq.gz ck-mature-pollen-2_R2.fq.gz ck_MP
mt_anther_knl2_1 knl2-1mm-anther-1_R1.fq.gz knl2-1mm-anther-1_R2.fq.gz mt_anther_knl2
mt_anther_knl2_2 knl2-1mm-anther-2_R1.fq.gz knl2-1mm-anther-2_R2.fq.gz mt_anther_knl2
mt_anther_pld3_knl2_1 pld3-knl2-1mm-anther-1_R1.fq.gz pld3-knl2-1mm-anther-1_R2.fq.gz mt_anther_pld3_knl2
mt_anther_pld3_knl2_2 pld3-knl2-1mm-anther-2_R1.fq.gz pld3-knl2-1mm-anther-2_R2.fq.gz mt_anther_pld3_knl2
mt_anther_mtl_knl2_1 mtl-knl2-1mm-anther-1_R1.fq.gz mtl-knl2-1mm-anther-1_R2.fq.gz mt_anther_mtl_knl2
mt_anther_mtl_knl2_2 mtl-knl2-1mm-anther-2_R1.fq.gz mtl-knl2-1mm-anther-2_R2.fq.gz mt_anther_mtl_knl2
mt_MP_knl2_1 knl2-mature-pollen-1_R1.fq.gz knl2-mature-pollen-1_R2.fq.gz mt_MP_knl2
mt_MP_knl2_2 knl2-mature-pollen-2_R1.fq.gz knl2-mature-pollen-2_R2.fq.gz mt_MP_knl2
mt_MP_pld3_1 pld3-mature-pollen-1_R1.fq.gz pld3-mature-pollen-1_R2.fq.gz mt_MP_pld3
mt_MP_pld3_2 pld3-mature-pollen-2_R1.fq.gz pld3-mature-pollen-2_R2.fq.gz mt_MP_pld3
mt_MP_mtl_1 mtl-mature-pollen-1_R1.fq.gz mtl-mature-pollen-1_R2.fq.gz mt_MP_mtl
mt_MP_mtl_2 mtl-mature-pollen-2_R1.fq.gz mtl-mature-pollen-2_R2.fq.gz mt_MP_mtl
mt_MP_pld3_knl2_1 pld3-knl2-mature-pollen-1_R1.fq.gz pld3-knl2-mature-pollen-1_R2.fq.gz mt_MP_pld3_knl2
mt_MP_pld3_knl2_2 pld3-knl2-mature-pollen-2_R1.fq.gz pld3-knl2-mature-pollen-2_R2.fq.gz mt_MP_pld3_knl2
mt_MP_mtl_knl2_1 mtl-knl2-mature-pollen-1_R1.fq.gz mtl-knl2-mature-pollen-1_R2.fq.gz mt_MP_mtl_knl2
mt_MP_mtl_knl2_2 mtl-knl2-mature-pollen-2_R1.fq.gz mtl-knl2-mature-pollen-2_R2.fq.gz mt_MP_mtl_knl2
mt_MP_pld3_mtl_knl2_1 pld3-mtl-knl2-mature-pollen-1_R1.fq.gz pld3-mtl-knl2-mature-pollen-1_R2.fq.gz mt_MP_pld3_mtl_knl2
mt_MP_pld3_mtl_knl2_2 pld3-mtl-knl2-mature-pollen-2_R1.fq.gz pld3-mtl-knl2-mature-pollen-2_R2.fq.gz mt_MP_pld3_mtl_knl2
My contrast file follows as below: id variable reference target
anther_ck_knl2 treatment ck_anther mt_anther_knl2
anther_ck_mtl_knl2 treatment ck_anther mt_anther_mtl_knl2
anther_ck_pld3_knl2 treatment ck_anther mt_anther_pld3_knl2
MP_ck_knl2 treatment ck_MP mt_MP_knl2
MP_ck_pld3 treatment ck_MP mt_MP_pld3
MP_ck_mtl treatment ck_MP mt_MP_mtl
MP_ck_pld3_knl2 treatment ck_MP mt_MP_pld3_knl2
MP_ck_mtl_knl2 treatment ck_MP mt_MP_mtl_knl2
MP_ck_pld3_mtl_knl2 treatment ck_MP mt_MP_pld3_mtl_knl2
The header 10 lines of my salmon.merged.gene_counts.tsv generated using nf-core/rnaseq pipeline follows as below: gene_id gene_name ck_anther_1 ck_anther_2 ck_MP_1 ck_MP_2 mt_anther_knl2_1 mt_anther_knl2_2 mt_anther_mtl_knl2_1 mt_anther_mtl_knl2_2 mt_anther_pld3_knl2_1 mt_anther_pld3_knl2_2 mt_MP_knl2_1 mt_MP_knl2_2 mt_MP_mtl_1 mt_MP_mtl_2 mt_MP_mtl_knl2_1 mt_MP_mtl_knl2_2 mt_MP_pld3_1mt_MP_pld3_2 mt_MP_pld3_knl2_1 mt_MP_pld3_knl2_2 mt_MP_pld3_mtl_knl2_1 mt_MP_pld3_mtl_knl2_2
ENSRNA049437471 tRNA-Asn 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0
ENSRNA049437473 tRNA-Thr 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0
ENSRNA049437518 tRNA-Asn 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0
ENSRNA049437607 tRNA-Met 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0
ENSRNA049437614 tRNA-Gly 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0
ENSRNA049437658 tRNA-Ala 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0
ENSRNA049437881 tRNA-Ser 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0
ENSRNA049437912 tRNA-Pro 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0
ENSRNA049437967 tRNA-Lys 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0

Command used and terminal output

nextflow run nf-core/differentialabundance -r 1.4.0 --input samplesheet.csv --contrasts sample_contrast_file.csv --matrix star_salmon/salmon.merged.gene_counts.tsv --transcript_length_matrix star_salmon/salmon.merged.transcript_lengths.tsv --gtf Zea_mays.gtf --outdir deg_analysis -profile rnaseq,docker

Relevant files

A full .nextflow.log is attached here. nextflow.log

System information

No response

CrazyHsu commented 11 months ago

@pinin4fjords Hi, Manning. Can you help me figure out what's the problem I'm facing? Any help would be highly appreciated! Thanks.

pinin4fjords commented 11 months ago

The string converting counts to integer mode tells me that the matrix read correctly. So the error is coming from https://github.com/nf-core/differentialabundance/blob/a3d664c12c4050bae2acc83b1c636dcc3546b9a5/modules/nf-core/deseq2/differential/templates/deseq_de.R#L347.

Since count matrix and gene length matrix are read in the exact same way, this suggests that your gene length matrix has different composition to the counts matrix in terms of identifiers. Please check that your gene lengths file has the same values in its first two columns (gene_id, gene_name) as the count matrix.

CrazyHsu commented 11 months ago

Hi @pinin4fjords, thanks for your quick reply. I have specify the --transcript_length_matrix with star_salmon/salmon.merged.gene_lengths.tsv intead of star_salmon/salmon.merged.transcript_lengths.tsv, and everything goes well. Thank you! :smile: