Closed VivianQM closed 1 year ago
I have the same issue. get_DE_info rejected all my cell types. Some cell types have symbols (+ or /), or space. But even cell type named "CD4" was also excluded. Was that only bcz not enough cells for a given cell type in each group?
Hi, I am getting the same error following the same steps. As you, I am confident that the groups and cell-types I selected pass the threshold of the minimum amount of cells required. I also ensured that all the variables are syntactically valid (make.names
).
Thank you all for raising this issue. We will try to fix it as soon as possible (hopefully next week).
Hi @VivianQM @mimisikai @SergioRodLla
I could replicate this issue by putting extremely high thresholds for genes to be expressed - as done in this example:
get_DE_info(sce, sample_id, group_id, celltype_id, batches, covariates, contrasts_oi, min_cells = 10, assay_oi_pb = "counts", fun_oi_pb = "sum", de_method_oi = "edgeR", findMarkers = FALSE, contrast_tbl = NULL, filterByExpr.min.count = 100000, filterByExpr.min.total.count = 100000, filterByExpr.large.n = 5, filterByExpr.min.prop = 0.7)
This might indicate that cell types will be removed if no genes pass the default gene filtering.
Could you try to put filterByExpr.min.count = 0
and filterByExpr.min.total.count = 0
as an experiment to see whether the issue concerns gene filtering. Since these parameter settings will not perform gene filtering, the analysis should run if the issue is due to gene filtering.
If the issue seems to concern gene filtering: read the documentation of filterByExpr
from edgeR to understand these filtering criteria, and think about whether this makes sense for your data. Could you then provide me with some info of this? If none of your cell types would pass this filtering criterion I would be very worried.
If the issue remains: could you share the following plot: abundance_expression_info$abund_plot_sample
?
Hi @browaeysrobin
Thank you for the fast reply. I tried to use filterByExpr.min.count = 0
and filterByExpr.min.total.count = 0
and it is giving me the same error.
I forgot to mention that for my case, not all the groups in my grouping variable (group_id
) reach the condition of >=2 samples with >=10 cells. However, the 3 contrasts I specified in contrasts_oi
reach this requirement. Does this mean that I should remove those groups from my sce
object even though they are not indicated as contrasts of interest? If I do this get_DE_info()
runs without errors.
Hi @SergioRodLla
Does this mean that I should remove those groups from my
sce
object even though they are not indicated as contrasts of interest? If I do thisget_DE_info()
runs without errors.
Indeed. Cell types should have enough cells in >=2 samples in each group in your data. So even groups that are not directly compared in your contrast are considered during the DE model analysis. (that's also mentioned in the error message normally ... - Also for groups not included in your contrasts!
.
To solve this, you can merge groups that are not of interest for your contrasts, or subset your data and leaving out these groups.
Thank you!
I wanted to make sure this was OK since at Step 3 of the introductory vignette (https://github.com/saeyslab/multinichenetr/blob/main/vignettes/basic_analysis_steps_MISC.md) I am getting a ligand_activities_targets_DEgenes$ligand_activities
dataframe with rows with NA values in the "target" and "ligand_target_weighted" columns. Specially if I use adjusted p-value instead of normal p-value, which happens in nearly half of the rows. Is this because of gene names missing from the ligand_target_matrix
? I am using the mouse Nichenet v2 networks.
Thank you again.
Hi @browaeysrobin,
Thanks for your reply. The issue remains in my case after trying to put filterByExpr.min.count = 0 and filterByExpr.min.total.count = 0. Here is my plot for abundance_expression_info$abund_plot_sample:
I also made sure that cell types have enough cells in >=2 samples in each group in sce (I subsetted the dataset)
Any clue why this is still happening? Any help is appreciated. Thank you!
Hi,
I am encountering the same error:
> DE_info = get_DE_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, batches = batches, covariates = covariates, contrasts_oi = contrasts_oi, min_cells = min_cells)
[1] "excluded cell types are:"
[1] "MAC" "SMACs" "VSM" "Tc" "Ductal.KCs" "Classical.MECs"
[7] "SerACs" "Oral.Mucosal.KCs" "ILCs" "Fibroblast.1" "Fibroblast.2" "Non.classical.MECs"
[13] "Non.Classical.Monocytes" "Classical.Monocytes" "Macrophage" "cDC1" "Th" "Treg"
[19] "MAIT" "IgM.Plasma.Cells" "IgG.Plasma.Cells" "CD8..IEL" "IgA.Plasma.Cells" "Plasmablasts"
[25] "CD16..NK.Cells" "VECs" "Transitional.Ductal.KCs" "Fibroblast.3" "Pericytes" "LECs"
[1] "These celltypes are not considered in the analysis. After removing samples that contain less cells than the required minimal, some groups don't have 2 or more samples anymore. As a result the analysis cannot be run. To solve this: decrease the number of min_cells or change your group_id and pool all samples that belong to groups that are not of interest! "
[1] "None of the cell types passed the check. This might be due to 2 reasons. 1) no cell type has enough cells in >=2 samples per group. 2) problem in batch definition: not all levels of your batch are in each group - Also for groups not included in your contrasts!"
Error in `dplyr::group_by()`:
! Must group by variables found in `.data`.
Column `contrast` is not found.
Column `cluster_id` is not found.
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
Unknown or uninitialised column: `cluster_id`.
I expected a number of cell types to be excluded but not all. Here is the plot for abundance_expression_info$abund_plot_sample:
I then subset the sce object to include only the cell types that passed the criteria of >= 2 samples per group with 10 cells or more and I get the same error. I also tried using filterByExpr.min.count = 0
and filterByExpr.min.total.count = 0
as suggested and nothing changed. I did not set any batches.
Any help would be much appreciated. Thanks!
Hi @catsargent, Hi @VivianQM
To help me further diagnose the problem:
1) Can you run get_DE_info
with the argument: findMarkers = TRUE
instead the default findMarkers = FALSE
. Do you then get the same error or not?
2) Can you merge your groups/samples so you have only groups with 3 or more samples per group (just for a test, not for your analysis). Do you then get the same error or not?
3) Can you show how you defined the contrasts, batches and covariates?
Thanks for the quick reply. I tried both of those suggestions and unfortunately they both resulted in the same original error.
@catsargent
Thanks for the quick reply. I tried both of those suggestions and unfortunately they both resulted in the same original error.
and can you show how you defined the contrasts, batches and covariates?
Sure! Here you go..
sample_id = "Sample"
group_id = "Niche"
celltype_id = "cell_type"
covariates = NA
batches = NA
contrasts_oi = c('Parotid.Gland-(Minor.Salivary.Gland+Submandibular.Gland)/2','Minor.Salivary.Gland-(Parotid.Gland+Submandibular.Gland)/2','Submandibular.Gland-(Minor.Salivary.Gland+Parotid.Gland)/2')
contrast_tbl = tibble(contrast =
c("Parotid.Gland-(Minor.Salivary.Gland+Submandibular.Gland)/2","Minor.Salivary.Gland-(Parotid.Gland+Submandibular.Gland)/2", "Submandibular.Gland-(Minor.Salivary.Gland+Parotid.Gland)/2"),
group = c("Parotid.Gland","Minor.Salivary.Gland","Submandibular.Gland"))
@catsargent
Everything looks fine at first sight. I have no idea what is going wrong and it's very hard when I cannot reproduce your errors myself.
To better see where exactly things might go wrong, can you run the following snippet of code in the place where you would normally run get_DE_info()
?
celltypes = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
contrasts = contrasts_oi
perform_muscat_de_analysis_test = function(sce, sample_id, celltype_id, group_id, batches, covariates, contrasts, assay_oi_pb = "counts", fun_oi_pb = "sum", de_method_oi = "edgeR", min_cells = 10, filterByExpr.min.count = 7, filterByExpr.min.total.count = 15, filterByExpr.large.n = 4, filterByExpr.min.prop = 0.7){
# conditions of interest in the contrast should be present in the in the group column of the metadata
groups_oi = SummarizedExperiment::colData(sce)[,group_id] %>% unique()
conditions_oi = stringr::str_split(contrasts, "'") %>% unlist() %>% unique() %>%
# stringr::str_split("[:digit:]") %>% unlist() %>% unique() %>%
stringr::str_split("\\)") %>% unlist() %>% unique() %>%
stringr::str_split("\\(") %>% unlist() %>% unique() %>%
stringr::str_split("-") %>% unlist() %>% unique() %>%
stringr::str_split("\\+") %>% unlist() %>% unique() %>%
stringr::str_split("\\*") %>% unlist() %>% unique() %>%
stringr::str_split("\\/") %>% unlist() %>% unique() %>% generics::setdiff(c("",","," ,", ", ")) %>% unlist() %>% unique()
conditions_oi = conditions_oi[is.na(suppressWarnings(as.numeric(conditions_oi)))]
# conditions of interest in the contrast should be present in the in the contrast_tbl
contrasts_simplified = stringr::str_split(contrasts, "'") %>% unlist() %>% unique() %>%
stringr::str_split(",") %>% unlist() %>% unique() %>% generics::setdiff(c("",",")) %>% unlist() %>% unique()
# prepare sce for the muscat pseudobulk analysis
sce$id = sce[[sample_id]]
sce = muscat::prepSCE(sce,
kid = celltype_id, # subpopulation assignments
gid = group_id, # group IDs (ctrl/stim)
sid = "id", # sample IDs (ctrl/stim.1234)
drop = FALSE) # drop all other SummarizedExperiment::colData columns ----------------- change to false
print("sce after prepSCE")
print(sce)
pb = muscat::aggregateData(sce,
assay = assay_oi_pb, fun = fun_oi_pb,
by = c("cluster_id", "sample_id"))
print("pb after aggregateData")
print(pb)
# prepare the experiment info (ei) table if batches present
if(length(batches) > 1){
batches_present = TRUE
} else {
if(!is.na(batches)){
batches_present = TRUE
} else {
batches_present = FALSE
}
}
if(batches_present){
extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id), all_of(batches)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor)
} else {
extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor)
}
if('sample_id' != sample_id){
extra_metadata$sample_id = extra_metadata[[sample_id]]
}
ei = S4Vectors::metadata(sce)$experiment_info
ei = ei %>% dplyr::inner_join(extra_metadata, by = "sample_id")
# prepare the experiment info (ei) table if covariates present
if(length(covariates) > 1){
covariates_present = TRUE
} else {
if(!is.na(covariates)){
covariates_present = TRUE
} else {
covariates_present = FALSE
}
}
if(covariates_present){
extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id), all_of(covariates)) %>% dplyr::distinct() ## no factorization of the continuous covariates!
} else {
extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor)
}
if('sample_id' != sample_id){
extra_metadata$sample_id = extra_metadata[[sample_id]]
}
if(is.null(ei)){
ei = S4Vectors::metadata(sce)$experiment_info
}
ei = ei %>% dplyr::inner_join(extra_metadata, by = "sample_id")
# prepare the design and contrast matrix for the muscat DE analysis
if(batches_present == TRUE & covariates_present == FALSE){
batches_string = paste0("ei$",batches) %>% paste(collapse = " + ")
design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id + ", batches_string, " ) ",sep="")))
dimnames(design) = list(ei$sample_id, c(levels(ei$group_id), make.names(colnames(design)[(length(levels(ei$group_id))+1):length(colnames(design))])) )
} else if (batches_present == FALSE & covariates_present == TRUE) {
covariates_string = paste0("ei$",covariates) %>% paste(collapse = " + ")
design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id + ", covariates_string, " ) ",sep="")))
dimnames(design) = list(ei$sample_id, c(levels(ei$group_id), make.names(colnames(design)[(length(levels(ei$group_id))+1):length(colnames(design))])) )
} else if (batches_present == TRUE & covariates_present == TRUE) {
batches_string = paste0("ei$",batches) %>% paste(collapse = " + ")
covariates_string = paste0("ei$",covariates) %>% paste(collapse = " + ")
design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id + ", batches_string, " + ", covariates_string, " ) ",sep="")))
dimnames(design) = list(ei$sample_id, c(levels(ei$group_id), make.names(colnames(design)[(length(levels(ei$group_id))+1):length(colnames(design))])) )
} else if (batches_present == FALSE & covariates_present == FALSE) {
design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id) ",sep="")))
dimnames(design) = list(ei$sample_id, c(levels(ei$group_id)))
}
contrast = eval(parse(text=paste("limma::makeContrasts(", contrasts, ",levels=design)",sep="")))
# filter genes
celltype = SummarizedExperiment::assayNames(pb) %>% .[1]
pb = pb[edgeR::filterByExpr(y = SummarizedExperiment::assay(pb, celltype), design = design, min.count = filterByExpr.min.count, min.total.count = filterByExpr.min.total.count, large.n = filterByExpr.large.n, min.prop = filterByExpr.min.prop), ]
print("pb after gene filtering")
print(pb)
# run DS analysis
res = muscat::pbDS(pb, method = de_method_oi , design = design, contrast = contrast, min_cells = min_cells, verbose = FALSE, filter = "none")
print("res")
print(head(res[[1]][[1]][[1]]))
de_output_tidy = muscat::resDS(sce, res, bind = "row", cpm = FALSE, frq = FALSE) %>% tibble::as_tibble() %>% dplyr::rename(p_adj = p_adj.glb)
print("de_output_tidy")
print(head(de_output_tidy))
output = list(de_output = res, de_output_tidy = de_output_tidy)
return(output)
}
DE_list = celltypes %>% lapply(function(celltype_oi, sce){
sce_oi = sce[, SummarizedExperiment::colData(sce)[,celltype_id] == celltype_oi]
DE_result = perform_muscat_de_analysis_test(sce = sce_oi,
sample_id = sample_id,
celltype_id = celltype_id,
group_id = group_id,
batches = batches,
covariates = covariates,
contrasts = contrasts_oi)
}, sce)
Can you send me the output and error/warnings? (You can do that via e-mail if you would prefer that)
Hi @browaeysrobin ,
Tried all of the suggestions you gave to @catsargent . None of them made a difference. This is how I defined my contrast and variables.
And I ran the block of code that you gave. No errors or warnings were given. I printed: "sce after prepSCE" "pb after aggregateData" - describes a singlecellexperiment object, assays "Granulosa" "pb after gene filtering" - describes a singlecellexperiment object, assays "Granulosa" "res" - a table of DEA results "de_output_tidy" - a table of DEA results (with negative folds marked red) And all the above again but with the assay being "Theca" (Theca and Granulosa are my two cell types).
Thanks for the help
Hi @VivianQM Since you did not encounter any error now: can you run this as well
celltype_de = list(
de_output = c(DE_list %>% purrr::map("de_output")),
de_output_tidy = DE_list %>% purrr::map("de_output_tidy") %>% bind_rows()
)
excluded_celltypes = celltypes %>% generics::setdiff(celltype_de$de_output_tidy$cluster_id) %>% unique()
if (length(excluded_celltypes) > 0) {
print("excluded cell types are:")
print(excluded_celltypes)
print("These celltypes are not considered in the analysis. After removing samples that contain less cells than the required minimal, some groups don't have 2 or more samples anymore. As a result the analysis cannot be run. To solve this: decrease the number of min_cells or change your group_id and pool all samples that belong to groups that are not of interest! ")
}
if (length(excluded_celltypes) == length(celltypes)) {
print("None of the cell types passed the check. This might be due to 2 reasons. 1) no cell type has enough cells in >=2 samples per group. 2) problem in batch definition: not all levels of your batch are in each group - Also for groups not included in your contrasts!")
}
hist_pvals = celltype_de$de_output_tidy %>% dplyr::inner_join(celltype_de$de_output_tidy %>% dplyr::group_by(contrast,cluster_id) %>% dplyr::count(), by = c("cluster_id","contrast")) %>%
dplyr::mutate(cluster_id = paste0(cluster_id, "\nnr of genes: ", n)) %>% dplyr::mutate(`p-value <= 0.05` = p_val <= 0.05) %>%
ggplot(aes(x = p_val, fill = `p-value <= 0.05`)) +
geom_histogram(binwidth = 0.05,boundary=0, color = "grey35") + scale_fill_manual(values = c("grey90", "lightsteelblue1")) +
facet_grid(contrast~cluster_id) + ggtitle("P-value histograms") + theme_bw()
Can you send me the exact output that you get on your console for the previous and this part of code? (can be done via mail if you don't want to share it here). I need to see the exact output in order to be able to pinpoint the problem.
Hi @VivianQM Since you did not encounter any error now: can you run this as well
celltype_de = list( de_output = c(DE_list %>% purrr::map("de_output")), de_output_tidy = DE_list %>% purrr::map("de_output_tidy") %>% bind_rows() ) excluded_celltypes = celltypes %>% generics::setdiff(celltype_de$de_output_tidy$cluster_id) %>% unique() if (length(excluded_celltypes) > 0) { print("excluded cell types are:") print(excluded_celltypes) print("These celltypes are not considered in the analysis. After removing samples that contain less cells than the required minimal, some groups don't have 2 or more samples anymore. As a result the analysis cannot be run. To solve this: decrease the number of min_cells or change your group_id and pool all samples that belong to groups that are not of interest! ") } if (length(excluded_celltypes) == length(celltypes)) { print("None of the cell types passed the check. This might be due to 2 reasons. 1) no cell type has enough cells in >=2 samples per group. 2) problem in batch definition: not all levels of your batch are in each group - Also for groups not included in your contrasts!") } hist_pvals = celltype_de$de_output_tidy %>% dplyr::inner_join(celltype_de$de_output_tidy %>% dplyr::group_by(contrast,cluster_id) %>% dplyr::count(), by = c("cluster_id","contrast")) %>% dplyr::mutate(cluster_id = paste0(cluster_id, "\nnr of genes: ", n)) %>% dplyr::mutate(`p-value <= 0.05` = p_val <= 0.05) %>% ggplot(aes(x = p_val, fill = `p-value <= 0.05`)) + geom_histogram(binwidth = 0.05,boundary=0, color = "grey35") + scale_fill_manual(values = c("grey90", "lightsteelblue1")) + facet_grid(contrast~cluster_id) + ggtitle("P-value histograms") + theme_bw()
Can you send me the exact output that you get on your console for the previous and this part of code? (can be done via mail if you don't want to share it here). I need to see the exact output in order to be able to pinpoint the problem.
that block ran without printing anything. I will send you those results asap, just wanted to check with my PI first, sorry
Hi @VivianQM
So that block of code also ran without errors? It is very weird that both blocks of code would run without errors, since those are the lines of code in get_DE_info
. Are you sure get_DE_info
is still giving errors?
I will send you those results asap, just wanted to check with my PI first, sorry
I understand (but just want to see console output, not the data objects of course)
Hi @VivianQM
So that block of code also ran without errors? It is very weird that both blocks of code would run without errors, since those are the lines of code in
get_DE_info
. Are you sureget_DE_info
is still giving errors?I will send you those results asap, just wanted to check with my PI first, sorry
I understand (but just want to see console output, not the data objects of course)
Yes it is still giving out this error:
Warning message:
“Unknown or uninitialised column: cluster_id
.”
[1] "excluded cell types are:"
[1] "Granulosa" "Theca"
[1] "These celltypes are not considered in the analysis. After removing samples that contain less cells than the required minimal, some groups don't have 2 or more samples anymore. As a result the analysis cannot be run. To solve this: decrease the number of min_cells or change your group_id and pool all samples that belong to groups that are not of interest! "
[1] "None of the cell types passed the check. This might be due to 2 reasons. 1) no cell type has enough cells in >=2 samples per group. 2) problem in batch definition: not all levels of your batch are in each group - Also for groups not included in your contrasts!"
Error in dplyr::group_by()
:
! Must group by variables found in .data
.
Column contrast
is not found.
Column cluster_id
is not found.
Traceback:
p-value <= 0.05
= p_val <=
. 0.05) %>% ggplot(aes(x = p_val, fill = p-value <= 0.05
))p-value <= 0.05
))p-value <= 0.05
= p_val <= 0.05)Very strange indeed...
@VivianQM
- get_DE_info(sce = sce, sample_id = sample_id, group_id = group_id, . celltype_id = celltype_id, covariates = covariates, contrasts_oi = contrasts_oi, . min_cells = 10, findMarkers = TRUE)
I see you have put findMarkers = TRUE here, can you set it to FALSE ? (probably will be the same error message, but you never know)
@VivianQM
- get_DE_info(sce = sce, sample_id = sample_id, group_id = group_id, . celltype_id = celltype_id, covariates = covariates, contrasts_oi = contrasts_oi, . min_cells = 10, findMarkers = TRUE)
I see you have put findMarkers = TRUE here, can you set it to FALSE ? (probably will be the same error message, but you never know)
Same error. Sorry :(
I'm having the same problems - can send over my output from the get_DE_info() replacement function if that would help?
Hi @ekawaler
I'm having the same problems - can send over my output from the get_DE_info() replacement function if that would help?
Yes that would help. The more similarities I see between all your output that are not present in the output of the several datasets I tested on will help.
@VivianQM
- get_DE_info(sce = sce, sample_id = sample_id, group_id = group_id, . celltype_id = celltype_id, covariates = covariates, contrasts_oi = contrasts_oi, . min_cells = 10, findMarkers = TRUE)
I see you have put findMarkers = TRUE here, can you set it to FALSE ? (probably will be the same error message, but you never know)
Email sent!
More info: when I run the same dataset using the pairwise wrapper it does work!
Hi Robin,
I did this and it ended with Error in x[[1]] : subscript out of bounds
.
I have emailed you the output.
Many thanks for looking into this!
Catherine
@catsargent
Everything looks fine at first sight. I have no idea what is going wrong and it's very hard when I cannot reproduce your errors myself.
To better see where exactly things might go wrong, can you run the following snippet of code in the place where you would normally run
get_DE_info()
?celltypes = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique() contrasts = contrasts_oi perform_muscat_de_analysis_test = function(sce, sample_id, celltype_id, group_id, batches, covariates, contrasts, assay_oi_pb = "counts", fun_oi_pb = "sum", de_method_oi = "edgeR", min_cells = 10, filterByExpr.min.count = 7, filterByExpr.min.total.count = 15, filterByExpr.large.n = 4, filterByExpr.min.prop = 0.7){ # conditions of interest in the contrast should be present in the in the group column of the metadata groups_oi = SummarizedExperiment::colData(sce)[,group_id] %>% unique() conditions_oi = stringr::str_split(contrasts, "'") %>% unlist() %>% unique() %>% # stringr::str_split("[:digit:]") %>% unlist() %>% unique() %>% stringr::str_split("\\)") %>% unlist() %>% unique() %>% stringr::str_split("\\(") %>% unlist() %>% unique() %>% stringr::str_split("-") %>% unlist() %>% unique() %>% stringr::str_split("\\+") %>% unlist() %>% unique() %>% stringr::str_split("\\*") %>% unlist() %>% unique() %>% stringr::str_split("\\/") %>% unlist() %>% unique() %>% generics::setdiff(c("",","," ,", ", ")) %>% unlist() %>% unique() conditions_oi = conditions_oi[is.na(suppressWarnings(as.numeric(conditions_oi)))] # conditions of interest in the contrast should be present in the in the contrast_tbl contrasts_simplified = stringr::str_split(contrasts, "'") %>% unlist() %>% unique() %>% stringr::str_split(",") %>% unlist() %>% unique() %>% generics::setdiff(c("",",")) %>% unlist() %>% unique() # prepare sce for the muscat pseudobulk analysis sce$id = sce[[sample_id]] sce = muscat::prepSCE(sce, kid = celltype_id, # subpopulation assignments gid = group_id, # group IDs (ctrl/stim) sid = "id", # sample IDs (ctrl/stim.1234) drop = FALSE) # drop all other SummarizedExperiment::colData columns ----------------- change to false print("sce after prepSCE") print(sce) pb = muscat::aggregateData(sce, assay = assay_oi_pb, fun = fun_oi_pb, by = c("cluster_id", "sample_id")) print("pb after aggregateData") print(pb) # prepare the experiment info (ei) table if batches present if(length(batches) > 1){ batches_present = TRUE } else { if(!is.na(batches)){ batches_present = TRUE } else { batches_present = FALSE } } if(batches_present){ extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id), all_of(batches)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor) } else { extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor) } if('sample_id' != sample_id){ extra_metadata$sample_id = extra_metadata[[sample_id]] } ei = S4Vectors::metadata(sce)$experiment_info ei = ei %>% dplyr::inner_join(extra_metadata, by = "sample_id") # prepare the experiment info (ei) table if covariates present if(length(covariates) > 1){ covariates_present = TRUE } else { if(!is.na(covariates)){ covariates_present = TRUE } else { covariates_present = FALSE } } if(covariates_present){ extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id), all_of(covariates)) %>% dplyr::distinct() ## no factorization of the continuous covariates! } else { extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor) } if('sample_id' != sample_id){ extra_metadata$sample_id = extra_metadata[[sample_id]] } if(is.null(ei)){ ei = S4Vectors::metadata(sce)$experiment_info } ei = ei %>% dplyr::inner_join(extra_metadata, by = "sample_id") # prepare the design and contrast matrix for the muscat DE analysis if(batches_present == TRUE & covariates_present == FALSE){ batches_string = paste0("ei$",batches) %>% paste(collapse = " + ") design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id + ", batches_string, " ) ",sep=""))) dimnames(design) = list(ei$sample_id, c(levels(ei$group_id), make.names(colnames(design)[(length(levels(ei$group_id))+1):length(colnames(design))])) ) } else if (batches_present == FALSE & covariates_present == TRUE) { covariates_string = paste0("ei$",covariates) %>% paste(collapse = " + ") design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id + ", covariates_string, " ) ",sep=""))) dimnames(design) = list(ei$sample_id, c(levels(ei$group_id), make.names(colnames(design)[(length(levels(ei$group_id))+1):length(colnames(design))])) ) } else if (batches_present == TRUE & covariates_present == TRUE) { batches_string = paste0("ei$",batches) %>% paste(collapse = " + ") covariates_string = paste0("ei$",covariates) %>% paste(collapse = " + ") design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id + ", batches_string, " + ", covariates_string, " ) ",sep=""))) dimnames(design) = list(ei$sample_id, c(levels(ei$group_id), make.names(colnames(design)[(length(levels(ei$group_id))+1):length(colnames(design))])) ) } else if (batches_present == FALSE & covariates_present == FALSE) { design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id) ",sep=""))) dimnames(design) = list(ei$sample_id, c(levels(ei$group_id))) } contrast = eval(parse(text=paste("limma::makeContrasts(", contrasts, ",levels=design)",sep=""))) # filter genes celltype = SummarizedExperiment::assayNames(pb) %>% .[1] pb = pb[edgeR::filterByExpr(y = SummarizedExperiment::assay(pb, celltype), design = design, min.count = filterByExpr.min.count, min.total.count = filterByExpr.min.total.count, large.n = filterByExpr.large.n, min.prop = filterByExpr.min.prop), ] print("pb after gene filtering") print(pb) # run DS analysis res = muscat::pbDS(pb, method = de_method_oi , design = design, contrast = contrast, min_cells = min_cells, verbose = FALSE, filter = "none") print("res") print(head(res[[1]][[1]][[1]])) de_output_tidy = muscat::resDS(sce, res, bind = "row", cpm = FALSE, frq = FALSE) %>% tibble::as_tibble() %>% dplyr::rename(p_adj = p_adj.glb) print("de_output_tidy") print(head(de_output_tidy)) output = list(de_output = res, de_output_tidy = de_output_tidy) return(output) } DE_list = celltypes %>% lapply(function(celltype_oi, sce){ sce_oi = sce[, SummarizedExperiment::colData(sce)[,celltype_id] == celltype_oi] DE_result = perform_muscat_de_analysis_test(sce = sce_oi, sample_id = sample_id, celltype_id = celltype_id, group_id = group_id, batches = batches, covariates = covariates, contrasts = contrasts_oi) }, sce)
Can you send me the output and error/warnings? (You can do that via e-mail if you would prefer that)
More info: when I run the same dataset using the pairwise wrapper it does work!
Hi @ekawaler: So you mean no errors are thrown when you follow the steps in: https://github.com/saeyslab/multinichenetr/blob/main/vignettes/pairwise_analysis_MISC.md ?
For @VivianQM and @catsargent: can you try the same as @ekawaler and run the steps in this vignette? Does it work that way as in @ekawaler's case?: https://github.com/saeyslab/multinichenetr/blob/main/vignettes/threewise_analysis_MISC.md
For @ekawaler @VivianQM @catsargent: Can you share your sessionInfo?
Hi @ekawaler @VivianQM @catsargent and others with this problem:
I adapted the get_DE_info
function. The previous function threw the wrong error message sometimes (not only when cell types were really missing, but also when another error occurred during the DE analysis), this should be fixed now and allow us to better identify each of your problems.
You can try using the new function after reinstalling multinichenetr.
Please send me the new error message you get. If the error message would already be clear for you to fix, try to do so as well, and let me know if that fix worked.
Ok thanks. Like @ekawaler, it also worked for me when using the wrapper function... using that it only complained about the actual cell types not meeting the criteria for DE analysis and then moved on using the cell types which did.
I shared my session_info in an email to you Robin.
@browaeysrobin Reinstalled, re-ran everything, and the error went away. HORAYYY! Thank you so much for your help and patience. And thanks to all who helped! Seems like everyone is doing fine, so I will go ahead and close this issue.
Hi,
Thank you so much for this amazing update. I am trying to use it for a 3-way comparison and have been following the tutorial: https://github.com/saeyslab/multinichenetr/blob/main/vignettes/basic_analysis_steps_MISC.md
When performing the DE analysis for each cell type, however, I ran into this
Warning message: “Unknown or uninitialised column:
cluster_id
.” [1] "excluded cell types are:" [1] "Granulosa" "Theca"[1] "These celltypes are not considered in the analysis. After removing samples that contain less cells than the required minimal, some groups don't have 2 or more samples anymore. As a result the analysis cannot be run. To solve this: decrease the number of min_cells or change your group_id and pool all samples that belong to groups that are not of interest! " [1] "None of the cell types passed the check. This might be due to 2 reasons. 1) no cell type has enough cells in >=2 samples per group. 2) problem in batch definition: not all levels of your batch are in each group - Also for groups not included in your contrasts!" Error in
dplyr::group_by()
: ! Must group by variables found in.data
. Columncontrast
is not found. Columncluster_id
is not found. Traceback:p-value <= 0.05
= p_val <= . 0.05) %>% ggplot(aes(x = p_val, fill =p-value <= 0.05
))p-value <= 0.05
))p-value <= 0.05
= p_val <= 0.05)I am very confident that I have enough cells in every group and cell type. Any suggestion on what might solve the problem? Any help is highly appreciated! Thank you so much.