lgeistlinger / EnrichmentBrowser

Seamless navigation through combined results of set-based and network-based enrichment analysis
19 stars 11 forks source link

Running deAna() using decimal (non-integer) counts with DESeq2 method #40

Closed haghshenas closed 3 years ago

haghshenas commented 3 years ago

Thank you so much for this package!

My goal is to do analysis based on expression estimates obtained from Salmon and imported using tximeta. I'm using EnrichmentBrowser_2.20.7 installed via Bioconductor. When I run deAna() function with DESeq2 method like below:

se <- normalize(se, norm.method="deseq2", filter.by.expr=FALSE)
res <- deAna(se, de.method="DESeq2", padj.method="BH", stat.only=FALSE, filter.by.expr=FALSE)

I get the following error message:

Error in deAna(se, de.method = "DESeq2", padj.method = "BH", stat.only = FALSE,  : 
  DESeq2 only applicable to integer read counts

Following the suggestion in #11, I also tried to set the dataType by metadata(se)$dataType <- "rseq" before running deAna() but that doesn't change the results and I still get the error message. It seems .detectDataType() function is being kicked in regardless of whether metadata(se)$dataType is set or not.

https://github.com/lgeistlinger/EnrichmentBrowser/blob/339bc563adb09bd491fbeb7187c97ac0273a557e/R/deAna.R#L123

Do you have any suggestions?

lgeistlinger commented 3 years ago

What expression units are you getting back from Salmon and trying to import here? are those TPMs?

haghshenas commented 3 years ago

When I import result of Salmon using tximeta, the SummarizedExperiment has two assays, counts and abundance (TPM) which both contain non-integer values. When running deAna without normalize() function, I tried passing both assay = "counts" and assay = "abundance" and I got the same error (DESeq2 only applicable to integer read counts) If I do run normalize() function like above, it seems the content of counts assay is copied into the raw in the resulting SummarizedExperiment .

lgeistlinger commented 3 years ago

Thanks, I think you have two possible routes:

  1. If you are primarily interested in gene set enrichment analysis, the easiest way to proceed would be to just use logTPMs as input based on the findings described here (esp Section 3.3). Then you could just setup your SummarizedExperiment se with a single assay containing the logTPMs and start off with:
se <- deAna(se)

which will carry out limma on the logTPMs as suggested in limma's user's guide.

and then proceed with your enrichment analysis of choice.

  1. if you are primarily interested in carrying out differential expression analysis and for some reason would like to stick to using DESeq2, then I'd suggest to follow the tximport and DESeq2 vignette, and proceed with EnrichmentBrowser::import on the resulting DESeq2DataSet for subsequent enrichment analyses.

Hope that helps.

haghshenas commented 3 years ago

Thanks for the suggestions. I would go with the second route. Just out of curiosity, considering that integer counts is not an actual requirement for DESeq2, isn't it worth to allow for that in EnrichmentBrowser's wrapper too? I think users won't mind to always specify the dataType of their experiments.

lgeistlinger commented 3 years ago

Well, it's not that easy, integer counts are actually a requirement for DESeq2 (https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-un-normalized-counts) - it just found a way to reconstruct the raw counts from the TPMs via tximport.

And then it becomes a design decision whether you want to wrap around the whole complexity of DESeq2 (different input types, extended experimental designs, ...) and start to depend on the corresponding dependencies such as tximport - which is obviously not the focus of the EnrichmentBrowser, which focuses - well - on enrichment analysis. The goal of deAna is providing a lightweight wrapper around the three major DE methods for standard scenarios. More specific scenarios upstream of the enrichment analysis are best taken care of by the specific packages for that purpose.

Note that the dataType argument is not primarily important for telling EnrichmentBrowser which kind of experiment has been carried out. It's rather used for telling EnrichmentBrowser whether to execute enrichment methods as originally developed for microarray data. In that sense, a logTPM input would correspond to a dataType = "ma" scenario. And only a raw integer read count input would correspond to a dataType = "rseq" scenario - where enrichment methods would then be executed in an accordingly adapted manner.

Hope that makes sense.