I am trying to do a differential abundance analysis with microbiome sequencing data suing DESeq2 package, but I keep getting errors after trying different approaches within the package. The data is a 50 by 501 matrix with each row being a sample and the first column being the group indicator and the other 500 columns are sequencing reads for 500 taxa. No missing data or zero-valued sequencing reads in the data set (see attached data set
data.xlsx
). Below is the R script for the analysis. Please help. Thanks a lot!
When I dig into the error message, it says "simpleError in estimateDispersionsFit(object, fitType = fitType, quiet = quiet): all gene-wise dispersion estimates are within 2 orders of magnitude from the minimum value, and so the standard curve fitting techniques will not work. One can instead use the gene-wise estimates as final estimates: dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst ...then continue with testing using nbinomWaldTest or nbinomLRT".
but it gave me another error message: "Error in .local(object, ...) : first calculate size factors, add normalizationFactors, or set normalized=FALSE".
Here is some session info:
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)
Hi Dr. Love,
I am trying to do a differential abundance analysis with microbiome sequencing data suing DESeq2 package, but I keep getting errors after trying different approaches within the package. The data is a 50 by 501 matrix with each row being a sample and the first column being the group indicator and the other 500 columns are sequencing reads for 500 taxa. No missing data or zero-valued sequencing reads in the data set (see attached data set data.xlsx ). Below is the R script for the analysis. Please help. Thanks a lot!
class(data) [1] "data.frame" countsData=t(data[,-1])+1 rownames(countsData)=colnames(data)[-1] class(countsData) [1] "matrix" print(dim(countsData)) [1] 500 50 print(countsData[1:5,1:5] [,1] [,2] [,3] [,4] [,5] rawCount1 8 8 10 11 9 rawCount2 8 9 8 7 8 rawCount3 36 34 30 36 33 rawCount4 7 8 6 5 8 rawCount5 10 7 63 13 64
get the binary predictor data and annotate the predictor data
xData=as.data.frame(data[,1]) colnames(xData)=colnames(data)[1] xData[,"x"]=as.factor(xData[,"x"]) class(xData) [1] "data.frame" print(dim(xData)) [1] 50 1 print(xData[1:5,]) [1] 0 0 1 0 1 Levels: 0 1
prepare for analysis
dds <- DESeqDataSetFromMatrix(countData=countsData,colData=xData,design=~x)
run analysis
suppressWarnings(runDeseq<- try(DESeq(dds, quiet = TRUE), silent = TRUE)) if (inherits(runDeseq, "try-error")) {
If the parametric fit failed, try the local.
suppressWarnings(runDeseq<- try(DESeq(dds, fitType = "local", quiet = TRUE), silent = TRUE)) if (inherits(runDeseq, "try-error")) {
If local fails, try the mean
suppressWarnings(runDeseq<- try(DESeq(dds, fitType = "mean", quiet = TRUE), silent = TRUE)) } if (inherits(runDeseq, "try-error")) {
If still bad, quit with error.
}
When I dig into the error message, it says "simpleError in estimateDispersionsFit(object, fitType = fitType, quiet = quiet): all gene-wise dispersion estimates are within 2 orders of magnitude from the minimum value, and so the standard curve fitting techniques will not work. One can instead use the gene-wise estimates as final estimates: dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst ...then continue with testing using nbinomWaldTest or nbinomLRT".
So I tried
dds <- DESeqDataSetFromMatrix(countData=countsData,colData=xData,design=~x) dds <- estimateDispersionsGeneEst(dds)
but it gave me another error message: "Error in .local(object, ...) : first calculate size factors, add normalizationFactors, or set normalized=FALSE".
Here is some session info:
sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 16299)
packageVersion("DESeq2") [1] ‘1.24.0’