zhxiaokang / RASflow

RNA-Seq analysis workflow
MIT License
103 stars 58 forks source link

DEA error and visualisation. #25

Closed AmrSaadeldin closed 2 years ago

AmrSaadeldin commented 3 years ago

Hello,

I am facing this error for 2 weeks and could not find any solution. I am running 3 group analyses (Control, Treat1, and Treat2). and during the DEA I face this error.

Start mapping using genome as reference! Building DAG of jobs... Nothing to be done. Complete log: /ebio/abt6/asaadeldin/RASflow/RASflow/.snakemake/log/2021-07-12T160020.335751.snakemake.log Start doing DEA! Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 DEA 1 all 2

[Mon Jul 12 16:00:21 2021] rule DEA: input: output/Arap11/genome/dea/countGroup/Emoy2_gene_count.tsv, output/Arap11/genome/dea/countGroup/H2O_gene_count.tsv, output/Arap11/genome/dea/countGroup/Waco9_gene_count.tsv output: output/Arap11/genome/dea/countGroup/H2O_gene_norm.tsv, output/Arap11/genome/dea/countGroup/Emoy2_gene_norm.tsv, output/Arap11/genome/dea/DEA/dea_H2O_Emoy2.tsv, output/Arap11/genome/dea/DEA/deg_H2O_Emoy2.tsv jobid: 1

Loading required package: limma Loading required package: S4Vectors Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:limma’:

plotMA

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

expand.grid

Loading required package: IRanges Loading required package: GenomicRanges Loading required package: GenomeInfoDb Loading required package: SummarizedExperiment Loading required package: Biobase Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

anyMissing, rowMedians

Loading required package: BiocParallel

Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

aperm, apply

Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.

Please read the vignette section 'Model matrix not full rank':

vignette('DESeq2') Calls: DEA ... DESeqDataSetFromMatrix -> DESeqDataSet -> checkFullRank Execution halted [Mon Jul 12 16:00:27 2021] Error in rule DEA: jobid: 1 output: output/Arap11/genome/dea/countGroup/H2O_gene_norm.tsv, output/Arap11/genome/dea/countGroup/Emoy2_gene_norm.tsv, output/Arap11/genome/dea/DEA/dea_H2O_Emoy2.tsv, output/Arap11/genome/dea/DEA/deg_H2O_Emoy2.tsv

RuleException: CalledProcessError in line 38 of /ebio/abt6/asaadeldin/RASflow/RASflow/workflow/dea_genome.rules: Command ' set -euo pipefail; Rscript scripts/dea_genome.R ' returned non-zero exit status 1. File "/ebio/abt6/asaadeldin/RASflow/RASflow/workflow/dea_genome.rules", line 38, in __rule_DEA File "/ebio/abt6/asaadeldin/miniconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /ebio/abt6/asaadeldin/RASflow/RASflow/.snakemake/log/2021-07-12T160021.273441.snakemake.log DEA is done! Start visualization of DEA results! Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 end 1 plot 2

[Mon Jul 12 16:00:28 2021] rule plot: input: output/Arap11/genome/dea/countGroup, output/Arap11/genome/dea/DEA output: output/Arap11/genome/dea/visualization/volcano_plot_H2O_Emoy2.pdf, output/Arap11/genome/dea/visualization/heatmap_H2O_Emoy2.pdf jobid: 1

Loading required package: plotscale hash-3.0.1 provided by Decision Patterns

Loading required package: GenomicFeatures Loading required package: BiocGenerics Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors Loading required package: stats4

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:hash’:

values, values<-

The following object is masked from ‘package:base’:

expand.grid

Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: GenomicRanges Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: ‘AnnotationDbi’

The following objects are masked from ‘package:hash’:

keys, keys<-

Loading required package: ggplot2 Loading required package: ggrepel Error in file(file, "rt") : cannot open the connection Calls: plot.volcano.heatmap -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'output/Arap11/genome/dea/DEA/dea_H2O_Emoy2.tsv': No such file or directory Execution halted [Mon Jul 12 16:00:34 2021] Error in rule plot: jobid: 1 output: output/Arap11/genome/dea/visualization/volcano_plot_H2O_Emoy2.pdf, output/Arap11/genome/dea/visualization/heatmap_H2O_Emoy2.pdf

RuleException: CalledProcessError in line 53 of /ebio/abt6/asaadeldin/RASflow/RASflow/workflow/visualize.rules: Command ' set -euo pipefail; Rscript scripts/visualize.R output/Arap11/genome/dea/countGroup output/Arap11/genome/dea/DEA output/Arap11/genome/dea/visualization ' returned non-zero exit status 1. File "/ebio/abt6/asaadeldin/RASflow/RASflow/workflow/visualize.rules", line 53, in __rule_plot File "/ebio/abt6/asaadeldin/miniconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /ebio/abt6/asaadeldin/RASflow/RASflow/.snakemake/log/2021-07-12T160028.139519.snakemake.log Visualization is done! RASflow is done!

AmrSaadeldin commented 3 years ago

I have many replicates in each group, for example, the first group has 24 replicates, the second group has 12, and the third group has 12 replicates.

zhxiaokang commented 3 years ago

Hi, the error lies in your formula design in DEA (since the error happens while doing DEA, so the downstream visualization is lacking input files). Please refer to the explanation in DESeq2's vignette: DESeq2 error: model-matrix-not-full-rank

AmrSaadeldin commented 3 years ago

Hi, thanks for the helpful reply. I have another inquiry; my analysis is running well using edgeR, however, I face this error while using DESeq2:

Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means Calls: DEA ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrix Execution halted [Mon Jul 19 15:09:09 2021] Error in rule DEA: jobid: 1 output: output/harAnalysis/genome/dea/countGroup/H2O_gene_norm.tsv, output/harAnalysis/genome/dea/countGroup/Waco9_gene_norm.tsv, output/harAnalysis/genome/dea/DEA/dea_H2O_Waco9.tsv, output/harAnalysis/genome/dea/DEA/deg_H2O_Waco9.tsv

RuleException: CalledProcessError in line 38 of /ebio/abt6/asaadeldin/RASflow/RASflow/workflow/dea_genome.rules: Command ' set -euo pipefail; Rscript scripts/dea_genome.R ' returned non-zero exit status 1. File "/ebio/abt6/asaadeldin/RASflow/RASflow/workflow/dea_genome.rules", line 38, in __rule_DEA File "/ebio/abt6/asaadeldin/miniconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /ebio/abt6/asaadeldin/RASflow/RASflow/.snakemake/log/2021-07-19T150902.527946.snakemake.log

I believe the error is due to the countrmatrix, if it is true, could you please let me know how to fix this error?

Thanks!

zhxiaokang commented 3 years ago

Hi, by searching the error message, I found this post potentially helpful to your case: Error: Every gene contains at least one zero, cannot compute log geometric means. As suggested there, you should check your count table to detect any problematic sample, or can either add a pseudo-count value of '1' to your data.