lmweber / diffcyt

R package for differential discovery analyses in high-dimensional cytometry data
MIT License
19 stars 12 forks source link

differential state analysis fails #9

Closed ramay closed 4 years ago

ramay commented 4 years ago

Hi Lukas, Congratulations on getting the paper published!

I was trying my data with the new SCE object in CATALYST and diffcyt package. For some reason, when I do the differential state analysis either with limma or LMM, they both fail in slightly different ways.

In case of LMM I get all p-values as NA and in case of limma I get the following message:

Error in contrasts.fit(fit, contrast) : Number of rows of contrast matrix must match number of coefficients in fit

Contrast that I used is :

contrast <- createContrast(c(0, 1,0,0))

Formula is: ds_formula1 <- createFormula(ei, cols_fixed = "condition")

The command used is:

ds_res3 <- diffcyt(sce, formula = ds_formula1, contrast = contrast, analysis_type = "DS", method_DS = "diffcyt-DS-limma", clustering_to_use = "meta14", verbose = TRUE,subsampling = 1000,transform = F)

Output: using SingleCellExperiment object from CATALYST as input using cluster IDs from clustering stored in column 'meta14' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST calculating features... calculating DS tests using method 'diffcyt-DS-limma'... Error in contrasts.fit(fit, contrast) : Number of rows of contrast matrix must match number of coefficients in fit

I have four groups in this dataset with 16 files in total.

Here is the link to the SCE data object in dropbox:

https://www.dropbox.com/s/ceqc28hcq0j8wz5/sce_dbglgata.rda?dl=0

I am sure I am making a silly mistake somewhere but I cannot figure it out even after looking at the source code.

Thanks! Hena

` sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina 10.15.1

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] DT_0.11 diffcyt_1.6.1 ggplot2_3.2.1
[4] flowCore_1.52.1 SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1 [7] DelayedArray_0.12.2 BiocParallel_1.20.1 matrixStats_0.55.0
[10] Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
[13] IRanges_2.20.2 S4Vectors_0.24.3 BiocGenerics_0.32.0
[16] CATALYST_1.10.1 limma_3.42.0

loaded via a namespace (and not attached): [1] shinydashboard_0.7.1 R.utils_2.9.2 ks_1.11.6
[4] lme4_1.1-21 tidyselect_0.2.5 htmlwidgets_1.5.1
[7] grid_3.6.2 Rtsne_0.15 munsell_0.5.0
[10] codetools_0.2-16 withr_2.1.2 colorspace_1.4-1
[13] flowViz_1.50.0 knitr_1.27 rstudioapi_0.10
[16] flowClust_3.24.0 robustbase_0.93-5 openCyto_1.24.0
[19] GenomeInfoDbData_1.2.2 mnormt_1.5-5 flowWorkspace_3.34.1
[22] vctrs_0.2.1 TH.data_1.0-10 xfun_0.12
[25] R6_2.4.1 ggbeeswarm_0.6.0 clue_0.3-57
[28] rsvd_1.0.2 locfit_1.5-9.1 bitops_1.0-6
[31] assertthat_0.2.1 promises_1.1.0 scales_1.1.0
[34] multcomp_1.4-12 beeswarm_0.2.3 gtable_0.3.0
[37] sandwich_2.5-1 rlang_0.4.2 zeallot_0.1.0
[40] GlobalOptions_0.1.1 splines_3.6.2 lazyeval_0.2.2
[43] hexbin_1.28.0 shinyBS_0.61 yaml_2.2.0
[46] reshape2_1.4.3 abind_1.4-5 backports_1.1.5
[49] httpuv_1.5.2 IDPmisc_1.1.20 RBGL_1.62.1
[52] tools_3.6.2 ellipsis_0.3.0 RColorBrewer_1.1-2
[55] ggridges_0.5.2 Rcpp_1.0.3 plyr_1.8.5
[58] base64enc_0.1-3 zlibbioc_1.32.0 purrr_0.3.3
[61] RCurl_1.98-1.1 FlowSOM_1.18.0 GetoptLong_0.1.8
[64] viridis_0.5.1 cowplot_1.0.0 zoo_1.8-7
[67] haven_2.2.0 ggrepel_0.8.1 cluster_2.1.0
[70] fda_2.4.8 magrittr_1.5 ncdfFlow_2.32.0
[73] data.table_1.12.8 openxlsx_4.1.4 circlize_0.4.8
[76] mvtnorm_1.0-12 hms_0.5.3 shinyjs_1.1
[79] mime_0.8 xtable_1.8-4 XML_3.99-0.3
[82] rio_0.5.16 jpeg_0.1-8.1 mclust_5.4.5
[85] readxl_1.3.1 gridExtra_2.3 shape_1.4.4
[88] ggcyto_1.14.0 compiler_3.6.2 scater_1.14.6
[91] ellipse_0.4.1 tibble_2.1.3 flowStats_3.44.0
[94] KernSmooth_2.23-16 crayon_1.3.4 minqa_1.2.4
[97] R.oo_1.23.0 htmltools_0.4.0 corpcor_1.6.9
[100] pcaPP_1.9-73 later_1.0.0 tidyr_1.0.0
[103] rrcov_1.5-2 RcppParallel_4.4.4 ComplexHeatmap_2.2.0
[106] MASS_7.3-51.5 boot_1.3-24 Matrix_1.2-18
[109] car_3.0-6 R.methodsS3_1.7.1 igraph_1.2.4.2
[112] forcats_0.4.0 pkgconfig_2.0.3 foreign_0.8-75
[115] plotly_4.9.1 vipor_0.4.5 XVector_0.26.0
[118] drc_3.0-1 stringr_1.4.0 digest_0.6.23
[121] tsne_0.1-3 ConsensusClusterPlus_1.50.0 graph_1.64.0
[124] cellranger_1.1.0 edgeR_3.28.0 DelayedMatrixStats_1.8.0
[127] curl_4.3 shiny_1.4.0 gtools_3.8.1
[130] nloptr_1.2.1 rjson_0.2.20 nlme_3.1-143
[133] lifecycle_0.1.0 jsonlite_1.6 carData_3.0-3
[136] BiocNeighbors_1.4.1 viridisLite_0.3.0 pillar_1.4.3
[139] lattice_0.20-38 fastmap_1.0.1 httr_1.4.1
[142] plotrix_3.7-7 DEoptimR_1.0-8 survival_3.1-8
[145] glue_1.3.1 zip_2.0.4 png_0.1-7
[148] Rgraphviz_2.30.0 stringi_1.4.5 nnls_1.4
[151] BiocSingular_1.2.1 CytoML_1.12.0 latticeExtra_0.6-29
[154] dplyr_0.8.3 irlba_2.3.3 `

lmweber commented 4 years ago

Hi Hena,

Thanks for your message, and sorry for the delay.

I tried this just now, and I think the problem may be that method diffcyt-DS-limma (i.e. the default method for DS testing) expects a design matrix instead of a model formula.

When I create a design matrix and run it as follows, it seems to work:

ds_design <- createDesignMatrix(ei, cols_design = "condition")
contrast <- createContrast(c(0, 1, 0, 0))
ds_res3 <- diffcyt(
    sce, 
    design = ds_design, 
    contrast = contrast, 
    analysis_type = "DS", 
    clustering_to_use = "meta14", 
    subsampling = 1000, 
    verbose = TRUE
)
topTable(ds_res3, format_vals = TRUE)

I also noticed you had transform = FALSE -- just checking this is intended, e.g. if you have already applied a transformation previously.

Does this solve your problem? I'll also try the other method (diffcyt-DS-LMM), but just wanted to let you know this for now.

I should probably also add a more useful error message in this case, since it wasn't very clear from the error message what the problem was.

lmweber commented 4 years ago

For method diffcyt-DS-LMM it also seems to work for me -- I don't get all NAs.

This method uses a formula instead of a design matrix, so I used code similar to yours above:

ei <- metadata(sce)$experiment_info
ds_formula1 <- createFormula(ei, cols_fixed = "condition")
contrast <- createContrast(c(0, 1, 0, 0))
ds_res3 <- diffcyt(
    sce, 
    formula = ds_formula1, 
    contrast = contrast, 
    analysis_type = "DS", 
    method_DS = "diffcyt-DS-LMM", 
    clustering_to_use = "meta14", 
    subsampling = 1000, 
    transform = FALSE, 
    verbose = TRUE
)
topTable(ds_res3)
ramay commented 4 years ago

Thanks ! I was using the wrong formula for diffcyt-DS-LMM. I used transform=F because I had already transformed the data when I made the SCE object using the CATALYST package. Is that Ok?

lmweber commented 4 years ago

Yes that should be fine. A transformation (e.g. the standard asinh(x/5) transform) just needs to be applied at some point in the pipeline, for the clustering to work correctly.