Closed LazDaria closed 1 year ago
Really hard to tell without more information. Probably there is a difference in the proportions from plotAbundances
and in topTable
because a normalization is applied to the latter. I would advise to take the (cluster/celltype-sample) count table and run the DA analyses direct on that, but more or less following the steps of diffcyt does. At least, that way, you can troubleshoot all the intermediate steps.
Thank you for your quick reply!
I will try to provide more information. In the figure below, which I generated according to the cytof workflow, you can see the cluster proportions per sample for two conditions (i.e. timepoints):
plotAbundances(sce, k = "celltype", by = "sample")
Same thing shown in boxplots:
plotAbundances(sce, k = "celltype", by = "cluster_id", shape_by = "sample_id")
It looks like there is a problem with the design or contrast matrix or with the experiment_info. Because everything is correct until I get to the DA testing:
metadata(sce)$experiment_info <- as.data.frame(colData(sce)) %>% group_by(sample_id,condition) %>%
summarise(n_cells=n(),.groups = 'drop') %>%
as.data.frame()
ei <- metadata(sce)$experiment_info
design <- createDesignMatrix(ei, cols_design = "condition")
contrast <- createContrast(c(0,1))
da_res1 <- diffcyt(sce,
design=design,
contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-edgeR",
clustering_to_use = "celltype",
min_cells=1, min_samples=1,
verbose = T)
rowData(da_res1$res)
This returns:
For some reason, the p value for clusters 7 to 13 is NA. And interestingly, the function topTable() return zero for the proportions of exactly those clusters across all samples, which, as can be seen in the figures above, is not correct.
topTable(da_res1, show_props = TRUE, format_vals = TRUE, digits = 2)
Here is a link to the SCE object.
Maybe I am overseeing something obvious...
Hope this was informative enough. Thank you so much for your help!
Thanks for the extra info; quite a lot of variability there!
If it were me, I would want to step through the testDA_edgeR
function and just double check that everything going in is ok .. design matrix, count table, etc. .. check the intermediate objects.
I'm going on vacation now for 2 weeks, but could check when I get back.
I now tested it with edgeR:
library(edgeR)
abundances <- table(metadata(sce)$cluster_codes$celltype, sce$sample_id)
abundances <- unclass(abundances)
extra.info <- colData(sce)[match(colnames(abundances), sce$sample_id),]
y.ab <- DGEList(abundances, samples=extra.info)
design <- model.matrix(~factor(condition), y.ab$samples)
y.ab <- estimateDisp(y.ab, design, trend="none")
fit.ab <- glmQLFit(y.ab, design, robust=TRUE, abundance.trend=FALSE)
res <- glmQLFTest(fit.ab, coef=ncol(design))
topTags(res)
With edgeR I get reasonable results:
So, it seems to be related to diffcyt or how I input my data to diffcyt. I will just use edgeR directly for now and maybe ask in the diffcyt github repo in parallel.
Have a nice vacation! Thank you for your support!
The count per cellType is incorrect, which impacts the next calculations.
# From initial object
> table(sce$cluster_id)
celltype_1 celltype_10 celltype_11 celltype_12 celltype_13 celltype_2 celltype_3
8684 632 670 129 7292 3193 1765
celltype_4 celltype_5 celltype_6 celltype_7 celltype_8 celltype_9
19786 3071 1065 6426 2505 1188
# From diffcyt output
> rowSums(da_res1$d_counts@assays@data$counts)
celltype_1 celltype_10 celltype_11 celltype_12 celltype_13 celltype_2 celltype_3
9316 0 0 0 0 670 18154
celltype_4 celltype_5 celltype_6 celltype_7 celltype_8 celltype_9
7292 19786 1188 0 0 0
Found results similar to @LazDaria edgeR results when running code line by line but excluding the following line. I excluded it because I did not catch what it is for. Maybe very specific to the use of FlowSOM. Excluding it results in no zero count per celltype (table above) and ensures that code_id == cluster_id. HTH https://github.com/lmweber/diffcyt/blob/39348fd4ebba6a2ca5c38325e587d5c9047f3fda/R/diffcyt_wrapper.R#L373
> rowData(res)
DataFrame with 13 rows and 6 columns
cluster_id logFC logCPM LR p_val p_adj
<factor> <numeric> <numeric> <numeric> <numeric> <numeric>
celltype_1 celltype_1 -0.993162 18.5858 4.343090 3.71594e-02 1.55842e-01
celltype_10 celltype_10 -2.377343 14.3412 3.777920 5.19332e-02 1.55842e-01
celltype_11 celltype_11 -0.203035 15.6122 0.061779 8.03706e-01 8.56278e-01
celltype_12 celltype_12 4.067186 16.2602 24.631513 6.94074e-07 9.02296e-06
celltype_13 celltype_13 -1.527985 14.4500 1.195690 2.74185e-01 4.45550e-01
... ... ... ... ... ... ...
celltype_5 celltype_5 0.246238 16.4777 0.1939437 0.659654 0.856278
celltype_6 celltype_6 0.835572 16.0875 2.4257423 0.119357 0.258606
celltype_7 celltype_7 -1.807111 14.9109 1.8789839 0.170450 0.316549
celltype_8 celltype_8 -0.172904 15.6040 0.0671972 0.795462 0.856278
celltype_9 celltype_9 -0.523879 14.6839 0.3293972 0.566014 0.817576
> rowSums(d_counts@assays@data$counts)
celltype_1 celltype_10 celltype_11 celltype_12 celltype_13 celltype_2 celltype_3
8684 632 670 129 7292 3193 1765
celltype_4 celltype_5 celltype_6 celltype_7 celltype_8 celltype_9
19786 3071 1065 6426 2505 1188
Hi,
first I want to thank you for this very useful package! I am working with spatial experiment objects comprising IMC data and was struggling a bit to transform my SPE into a catalyst compatible SCE object. I managed by creating a new SCE object based on my count matrix, colData, clusters and reducedDims and adding experiment_info to the metadata slot. Using this newly created SCE, I worked through the CyTOF workflow and everything worked fine until the differential abundance section. For some reason, I obtain NA values for my p-values and the proportions returned by
differ from the true proportions which I obtained by:
Here is how I performed the DA testing:
I initially thought that the reason for this might be that I only have very few cells for some samples per cluster and conditon, as I am comparing clusters within a metaclusters. However, setting min_cells and min_samples to a lower value did not fix the problem. I am new to DA testing, but I compared the outputs using HDCytoData to my data and found no difference. I also sent you the generated SCE object per email. Thanks so much already!