Closed mctseng2 closed 6 years ago
Correct, in this case the data type would be detected as "ma"
.
Note that in effect this distinction between data types is mainly done for deciding which DE and EA methods to apply.
The main concern when providing raw RNA-seq read count data (for which transcript length and library size has not been taken into account), is that DE and EA microarray methods are not per se applicable.
I think what you are having here are TPMs, which typically allow, except for
maybe an additional variance stabilizing transformation, the application of
methods as developed for microarray data.
Note also that you can always set the data type for your SummarizedExperiment
named se
by:
metadata(se)$dataType <- "rseq"
However, depending on what you are intending to do, "ma" would be indeed the more appropriate choice here. At least concerning the enrichment analysis.
Thanks for the great package! This is not a bug but my personal concern . I was just curious about how the deAna() know the data type (micro array or RNAseq) it receive and found out this small function .detectDataType() inside deAna() which tend to categorize the data to RNAseq if the matrix is integer otherwise to microarray. Since I am using salmon and tximport which generates non-interger count matrix (due to scaling to transcript length and library size). It seems that deAna() will go for the microarray procedure instead of correctly selecting rseq.
This is my count matirx
This is when I tried to hack in .detectDataType() to see if the function correctly tells the datatype:
I am wondering if the data.type argument can be added to deAna() because some users might not aware of it and might get tripped. Thanks!