lmweber / diffcyt

R package for differential discovery analyses in high-dimensional cytometry data
MIT License
19 stars 12 forks source link

Mixed Model and Contrast #17

Open vivek-verma202 opened 4 years ago

vivek-verma202 commented 4 years ago

Hello Lukas, Thank-you for Diffcyt. I had a quick question about setting up contrast matrix for mixed model. I hypothesize that the cell cluster proportions depend on the condition (case-status, binary) when corrected for age and sex (fixed effect), and the date of batch processing (categorical, 4 values, random effect)

ei <- metadata(sce)$experiment_info
> head(ei)
  sample_id condition    sex age        date n_cells
1       E23   Control Female  46 16-AUG-2019  150000
2       E25   Control   Male  42 28-AUG-2019  150000
3       E26   Control Female  45 16-AUG-2019  150000
4       E28   Control Female  52 16-AUG-2019  150000
5       E29      Case Female  40 28-AUG-2019  150000
6       E30      Case Female  42 28-AUG-2019  150000
> str(ei)
'data.frame':   102 obs. of  6 variables:
 $ sample_id: Factor w/ 102 levels "E23","E25","E26",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ condition: Factor w/ 2 levels "Control","Case": 1 1 1 1 2 2 2 1 1 2 ...
 $ sex      : Factor w/ 2 levels "Female","Male": 1 2 1 1 1 1 1 1 1 1 ...
 $ age      : num  46 42 45 52 40 42 52 52 46 62 ...
 $ date     : Factor w/ 4 levels "08-OCT-2019",..: 2 4 2 2 4 4 1 2 2 4 ...
 $ n_cells  : num  150000 150000 150000 150000 150000 150000 150000 150000 150000 150000 ...
da_formula1 <- createFormula(ei,
                             cols_fixed = c("condition", "age", "sex"),
                             cols_random = "date"
)
> da_formula1[["formula"]]
y ~ condition + age + sex + (1 | date)
contrast <- createContrast(c(0, 1, 0, 0))
> contrast
     [,1]
[1,]    0
[2,]    1
[3,]    0
[4,]    0

da_res1 <- diffcyt(sce,
                   formula = da_formula1, contrast = contrast,
                   analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
                   clustering_to_use = "meta10", verbose = T
)
topTable(da_res1, format_vals = T)
> topTable(da_res1, format_vals = T)
DataFrame with 10 rows and 3 columns
   cluster_id     p_val     p_adj
     <factor> <numeric> <numeric>
1          1   0.00e+00  0.00e+00
2          2   0.00e+00  0.00e+00
3          3   0.00e+00  0.00e+00
4          4   0.00e+00  0.00e+00
5          5   0.00e+00  0.00e+00
6          6   0.00e+00  0.00e+00
7          7   0.00e+00  0.00e+00
8          8   0.00e+00  0.00e+00
10         10  1.78e-15  1.97e-15
9          9   7.95e-01  7.95e-01

Either I am about to publish a Science / Nature paper or I messed up (the probability of latter is close to 1). Could you please let me know where am I messing up? Also, like Toptable, is there a quick & easy way to check the effect size? Thanks a lot! Vivek

vivek-verma202 commented 4 years ago

Just an update, I tried running the same using testDA_GLMM(), which requires using calcCounts(), which needs running generateClusters(). However,

> sce <-  generateClusters(sce)
Error: Argument 'exprs' must be numeric matrix with colnames attribute set
> typeof(sce@assays@data@listData[["exprs"]])
[1] "double"
> dim(sce@assays@data@listData[["exprs"]])
[1]       14 15257202
> sce@assays@data@listData[["exprs"]] <- as.matrix(sce@assays@data@listData[["exprs"]])
> typeof(sce@assays@data@listData[["exprs"]])
[1] "double"
> dim(sce@assays@data@listData[["exprs"]])
[1]       14 15257202
> sce <-  generateClusters(sce)
Error: Argument 'exprs' must be numeric matrix with colnames attribute set

Tried with normalize = T:

> da_res1 <- diffcyt(sce,
+                    formula = da_formula1, contrast = contrast,
+                    analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
+                    normalize = T, norm_factors = "TMM",
+                    clustering_to_use = "meta20", verbose = T
+ )
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta20' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DA tests using method 'diffcyt-DA-GLMM'...
There were 20 warnings (use warnings() to see them)
> topTable(da_res1, format_vals = T)
DataFrame with 20 rows and 3 columns
    cluster_id     p_val     p_adj
      <factor> <numeric> <numeric>
1            1         0         0
2            2         0         0
3            3         0         0
4            4         0         0
5            5         0         0
...        ...       ...       ...
19          19  0.00e+00  0.00e+00
20          20  0.00e+00  0.00e+00
14          14  1.20e-13  1.33e-13
18          18  7.59e-09  7.99e-09
10          10  1.09e-02  1.09e-02
> warnings()
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  ... :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?

Identical warning message for 20 times.

Using the most minimalistic model didn't help either:

> da_formula1 <- createFormula(ei,
+                              cols_fixed = c("condition"),
+                              cols_random = "date"
+ )
> contrast <- createContrast(c(0, 1))
> da_res1 <- diffcyt(sce,
+                    formula = da_formula1, contrast = contrast,
+                    analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
+                    normalize = T, norm_factors = "TMM",
+                    clustering_to_use = "meta20", verbose = T
+ )
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta20' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DA tests using method 'diffcyt-DA-GLMM'...
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
> topTable(da_res1, format_vals = T)
DataFrame with 20 rows and 3 columns
    cluster_id     p_val     p_adj
      <factor> <numeric> <numeric>
1            1         0         0
2            2         0         0
3            3         0         0
4            4         0         0
5            5         0         0
...        ...       ...       ...
17          17  0.00e+00  0.00e+00
19          19  0.00e+00  0.00e+00
20          20  0.00e+00  0.00e+00
18          18  5.02e-05  5.28e-05
10          10  7.44e-04  7.44e-04