stemangiola / tidybulk

Brings bulk and pseudobulk transcriptomics to the tidyverse
https://stemangiola.github.io/tidybulk/
164 stars 25 forks source link

The first element of assays in SE might be a matrix when using DESeq2 method #269

Open xiangpin opened 1 year ago

xiangpin commented 1 year ago

The related issue

> library(tidybulk)
========================================
tidybulk version 1.12.0
If you use TIDYBULK in published research, please cite:

Mangiola et al. tidybulk: an R tidy framework for modular
transcriptomic data analysis. Genome Biology 2021.

This message can be suppressed by:
  suppressPackageStartupMessages(library(tidybulk))
========================================
> library(SummarizedExperiment) %>% suppressPackageStartupMessages()
> assays(se_mini)[[1]] <- assays(se_mini)[[1]] %>% data.frame()
> se_mini %>% identify_abundant() %>% test_differential_abundance(.formula=~condition, method='DESeq2')
No group or design set. Assuming all samples belong to one group.
renaming the first element in assays to 'counts'
Error in DESeq2::DESeqDataSet(., design = .formula) :
  counts matrix should be numeric, currently it has mode: list
>

when convert it to matrix, it works

> assays(se_mini)[[1]] <- assays(se_mini)[[1]] %>% as.matrix()
> se_mini %>% identify_abundant() %>% test_differential_abundance(.formula=~condition, method='DESeq2')
No group or design set. Assuming all samples belong to one group.
renaming the first element in assays to 'counts'
converting counts to integer mode
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates
fitting model and testing
tidybulk says: to access the raw results (fitted GLM) do `attr(..., "internals")$DESeq2`
class: SummarizedExperiment
dim: 527 5
metadata(0):
assays(1): count
rownames(527): ABCB4 ABCB9 ... ZNF324 ZNF442
rowData names(8): entrez .abundant ... pvalue padj
colnames(5): SRR1740034 SRR1740035 SRR1740043 SRR1740058 SRR1740067
colData names(5): Cell.type time condition days dead
>
stemangiola commented 1 year ago

@mikelove in case you have any opinion

mikelove commented 1 year ago

Great!

stemangiola commented 1 year ago

Thanks for the PR @xiangpin !

Maybe I'm missing the obvious but why should we add a assay that is a dataframe where DESeq2 needs counts. I fail to see the use-case could you be more explicit?

mikelove commented 1 year ago

I would say that having a data.frame instead of matrix in the SummarizedExperiment is not typical.

xiangpin commented 1 year ago

Yes, a data.frame in the SummarizedExperiment is not typical. But for many users, they might also import a data frame to assays when using the SummarizedExperiment function to build a SummarizedExperiment object, since the help information of SummarizedExperiment show that assays: A ‘list’ or ‘SimpleList’ of matrix-like elements, or a matrix-like object (e.g. an ordinary matrix, a data frame, a DataFrame object from the ‘S4Vectors’ package, a sparseMatrix derivative from the ‘Matrix’ package, a DelayedMatrix object from the ‘DelayedArray’ package, etc...). and data frame might be more similar to other objects for many users.

stemangiola commented 1 year ago

As the original SE is not changed (but only just the DESeq2 object, passed to the internals for reproducibility, is), I think this PR is great.

@xiangpin The only thing I ask before pushing is a unit test that would fail before PR (current master) and produce the expected result with the PR.

Thanks for this PR!