Error !is.null(pbs[["group_id"]]) is not TRUE

Hello,

I am getting the following error on every cell type when calling get_DE_info:

perform_muscat_de_analysis errored for celltype: CD8+ T cells
Here's the original error message:
!is.null(pbs[["group_id"]]) is not TRUE
<simpleError in .check_pbs(pb, check_by = TRUE): !is.null(pbs[["group_id"]]) is not TRUE>
perform_muscat_de_analysis errored for celltype: CD8+ T cells

This is my code

#make sure that the gene symbols used in the expression data are also updated and made syntactically valid.
sce = alias_to_symbol_SCE(sce, "human") %>% makenames_SCE()

#Define in which metadata columns we can find the group, sample and cell type IDs
sample_id = "patient"
group_id = "disease"
celltype_id = "level_2"
covariates = NA
batches = NA 

senders_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
receivers_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()

min_cells=10

abundance_expression_info$abund_plot_sample

contrasts_oi = c("'H-P','P-N','N-P'")

contrast_tbl = tibble(contrast = 
                        c('H-P','P-N','N-P'), 
                      group = c("H","P","N"))

DE_info = get_DE_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, batches = batches, covariates = covariates, contrasts_oi = contrasts_oi, min_cells = min_cells)

I have tried running it via the wrapper, as well as altering the group_ID variable and the contrasts_oi, with the same problem. I have checked my contrasts_oi are set up properly. I have 300000 cells, so plenty in each group, and have also had the same error when I have downsampled to test on a smaller object of 5000 cells.

Do you have any ideas how I can overcome this?

Thanks,

Caroline

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /gpfs3/apps/eb/2020b/skylake/software/OpenBLAS/0.3.9-GCC-9.3.0/lib/libopenblas_skylakexp-r0.3.9.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] patchwork_1.1.2             future_1.30.0               clustree_0.5.0              ggraph_2.1.0                viridis_0.6.2              
 [6] viridisLite_0.4.1           forcats_0.5.2               stringr_1.5.0               purrr_1.0.1                 readr_2.1.3                
[11] tidyr_1.3.0                 tibble_3.2.1                tidyverse_1.3.2             SeuratObject_4.1.3          Seurat_4.3.0               
[16] multinichenetr_1.0.0        ggplot2_3.4.0               dplyr_1.1.2                 SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
[21] Biobase_2.54.0              GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.4           
[26] BiocGenerics_0.40.0         MatrixGenerics_1.6.0        matrixStats_0.63.0

Hi @carolinemorrell

Are you sure your input satisfies this condition stated in the vignettes:

Important: the column names of group, sample, cell type, batches and covariates should be syntactically valid (make.names) Important: All group, sample, cell type, batch and covariate names should be syntactically valid as well (make.names)

(some of the packages under the hood of MultiNicheNet require this)

"CD8+ T cells", the example you provided, is not a R syntactically valid cell type name. Can you try to rerun the analysis after satisfying this condition?

Thanks so much for getting back to me.

I've run make.names across all included variables now (and now trying a different celltype_ID column which never had any special characters), and am still getting the same error -this is it in full

> DE_info = get_DE_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, batches = batches, covariates = covariates, contrasts_oi = contrasts_oi, min_cells = min_cells)
perform_muscat_de_analysis errored for celltype: Stroma
Here's the original error message:
!is.null(pbs[["group_id"]]) is not TRUE
<simpleError in .check_pbs(pb, check_by = TRUE): !is.null(pbs[["group_id"]]) is not TRUE>
perform_muscat_de_analysis errored for celltype: Stroma

[1] "In case: Error in x[[1]]: subscript out of bounds: this likely means that there are not enough samples per group with sufficient cells of this cell type. This cell type will thus be ignored for further analyses, other cell types will still be considered."
perform_muscat_de_analysis errored for celltype: Immune
Here's the original error message:
!is.null(pbs[["group_id"]]) is not TRUE
<simpleError in .check_pbs(pb, check_by = TRUE): !is.null(pbs[["group_id"]]) is not TRUE>
perform_muscat_de_analysis errored for celltype: Immune

[1] "In case: Error in x[[1]]: subscript out of bounds: this likely means that there are not enough samples per group with sufficient cells of this cell type. This cell type will thus be ignored for further analyses, other cell types will still be considered."
perform_muscat_de_analysis errored for celltype: Epithelium
Here's the original error message:
!is.null(pbs[["group_id"]]) is not TRUE
<simpleError in .check_pbs(pb, check_by = TRUE): !is.null(pbs[["group_id"]]) is not TRUE>
perform_muscat_de_analysis errored for celltype: Epithelium

[1] "In case: Error in x[[1]]: subscript out of bounds: this likely means that there are not enough samples per group with sufficient cells of this cell type. This cell type will thus be ignored for further analyses, other cell types will still be considered."
perform_muscat_de_analysis errored for celltype: Endothelium
Here's the original error message:
!is.null(pbs[["group_id"]]) is not TRUE
<simpleError in .check_pbs(pb, check_by = TRUE): !is.null(pbs[["group_id"]]) is not TRUE>
perform_muscat_de_analysis errored for celltype: Endothelium

[1] "In case: Error in x[[1]]: subscript out of bounds: this likely means that there are not enough samples per group with sufficient cells of this cell type. This cell type will thus be ignored for further analyses, other cell types will still be considered."
[1] "DE analysis is done:"
[1] "included cell types are:"
NULL
[1] "excluded cell types are:"
[1] "Stroma"      "Immune"      "Epithelium"  "Endothelium"
[1] "These celltypes are not considered in the analysis. After removing samples that contain less cells than the required minimal, some groups don't have 2 or more samples anymore (also relevant for groups not included in your contrasts!). As a result the analysis cannot be run. To solve this: decrease the number of min_cells or change your group_id and pool all samples that belong to groups that are not of interest! "
[1] "DE analysis did error for all cell types. This might be because of several reasons - check the original error message for this. Here are 2 common reasons in case no cell type past the filtering criteria: 1) no cell type has enough cells in >=2 samples per group. 2) problem in batch definition: not all levels of your batch are in each group - Also for groups not included in your contrasts!"
Error in `dplyr::group_by()`:
! Must group by variables found in `.data`.
Column `contrast` is not found.
Column `cluster_id` is not found.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: Unknown or uninitialised column: `cluster_id`. 
2: Unknown or uninitialised column: `cluster_id`.

and the traceback

Backtrace:
     ▆
  1. ├─multinichenetr::get_DE_info(...)
  2. │ └─... %>% ggplot(aes(x = p_val, fill = `p-value <= 0.05`))
  3. ├─ggplot2::ggplot(., aes(x = p_val, fill = `p-value <= 0.05`))
  4. ├─dplyr::mutate(., `p-value <= 0.05` = p_val <= 0.05)
  5. ├─dplyr::mutate(...)
  6. ├─dplyr::inner_join(...)
  7. ├─dplyr:::inner_join.data.frame(...)
  8. │ └─dplyr::auto_copy(x, y, copy = copy)
  9. │   ├─dplyr::same_src(x, y)
 10. │   └─dplyr:::same_src.data.frame(x, y)
 11. │     └─base::is.data.frame(y)
 12. ├─... %>% dplyr::count()
 13. ├─dplyr::count(.)
 14. ├─dplyr::group_by(., contrast, cluster_id)
 15. └─dplyr:::group_by.data.frame(., contrast, cluster_id)
 16.   └─dplyr::group_by_prepare(.data, ..., .add = .add, error_call = current_env())
 17.     └─rlang::abort(bullets, call = error_call)

Which I think refers to the later dpylr error.

I am able to run the vignettes (step by step misc and the lung atlas one) on the zenodo data without problems, so I am sure the issue is my data, but I am new to SCE (usually I work within seurat) and am struggling to understand where the problem has come from/ how to fix it. If you have any other ideas I would massively appreciate it as I really want to get the package to work :)

Hi @carolinemorrell

Your issue seems to occur during the pseudobulk count matrix generation.

Could you provide the output of SummarizedExperiment::colData(sce)[,c(sample_id, group_id, celltype_id)] %>% head() and also share how you created your SCE ?

Thanks so much

I used

sce = Seurat::as.SingleCellExperiment(seurat_obj, assay = "RNA")

to generate my SCE.

Then for the code output

> SummarizedExperiment::colData(sce)[,c(sample_id, group_id, celltype_id)] %>% head()
> SummarizedExperiment::colData(sce)[,c(sample_id, group_id, celltype_id)] %>% head()
DataFrame with 6 rows and 3 columns
                            patient     disease              level_2
                        <character> <character>          <character>
OP10_AAACCCAAGAACGTGC-1        OP10           N           Fibroblast
OP10_AAACCCAAGATGCGAC-1        OP10           N Vascular.Endothelium
OP10_AAACCCAAGCCATGCC-1        OP10           N           Fibroblast
OP10_AAACCCAAGGAGCAAA-1        OP10           N           Epithelium
OP10_AAACCCACAAATAAGC-1        OP10           N Vascular.Endothelium
OP10_AAACCCACAACCAGAG-1        OP10           N              B.cells

Does that give you any clues? Thanks again, Caroline

Hi @carolinemorrell

This looks OK.

Can you run following snippet of code line by line

# prepare pseudobulking
sce$id = sce[[sample_id]]
sce = muscat::prepSCE(sce,
                      kid = celltype_id, # subpopulation assignments
                      gid = group_id,  # group IDs (ctrl/stim)
                      sid = "id",   # sample IDs (ctrl/stim.1234)
                      drop = FALSE)  # drop all other SummarizedExperiment::colData columns ----------------- change to false

# perform pseudobulking
pb = muscat::aggregateData(sce,
                           assay = "counts", fun = "sum",
                           by = c("cluster_id", "sample_id"))

# prepare design matrix
extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor) 
if('sample_id' != sample_id){
  extra_metadata$sample_id = extra_metadata[[sample_id]]
}
ei = S4Vectors::metadata(sce)$experiment_info
ei = ei %>%  dplyr::inner_join(extra_metadata, by = "sample_id")
design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id) ",sep="")))
dimnames(design) = list(ei$sample_id, c(levels(ei$group_id)))

# filter out insufficient expressed genes  
celltype = SummarizedExperiment::assayNames(pb) %>% .[1]
pb2 = pb[edgeR::filterByExpr(y = SummarizedExperiment::assay(pb, celltype), design = design, min.count = 7, min.total.count = 15, large.n = 4, min.prop = 0.7), ]
print(pb)
print(pb2)

# run DE analysis
contrast = eval(parse(text=paste("limma::makeContrasts(", contrasts, ",levels=design)",sep="")))
res = muscat::pbDS(pb, method = "edgeR" , design = design, contrast = contrast, min_cells = 10, verbose = TRUE, filter = "none")
res2 = muscat::pbDS(pb2, method = "edgeR" , design = design, contrast = contrast, min_cells = 10, verbose = TRUE, filter = "none")
print(res)
print(res2)

Can you provide me the output of this, and let me know which lines of code did not run?

Thanks again

This is the code as run and the error

> # prepare pseudobulking
> sce$id = sce[[sample_id]]
> sce = muscat::prepSCE(sce,
+                       kid = celltype_id, # subpopulation assignments
+                       gid = group_id,  # group IDs (ctrl/stim)
+                       sid = "id",   # sample IDs (ctrl/stim.1234)
+                       drop = FALSE)  # drop all other SummarizedExperiment::colData columns ----------------- change to false
> # perform pseudobulking
> pb = muscat::aggregateData(sce,
+                            assay = "counts", fun = "sum",
+                            by = c("cluster_id", "sample_id"))
> # prepare design matrix
> extra_metadata = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor) 
> if('sample_id' != sample_id){
+   extra_metadata$sample_id = extra_metadata[[sample_id]]
+ }
> ei = S4Vectors::metadata(sce)$experiment_info
> ei = ei %>%  dplyr::inner_join(extra_metadata, by = "sample_id")
> design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id) ",sep="")))
> dimnames(design) = list(ei$sample_id, c(levels(ei$group_id)))
> # filter out insufficient expressed genes  
> celltype = SummarizedExperiment::assayNames(pb) %>% .[1]
> pb2 = pb[edgeR::filterByExpr(y = SummarizedExperiment::assay(pb, celltype), design = design, min.count = 7, min.total.count = 15, large.n = 4, min.prop = 0.7), ]
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'i' in selecting a method for function '[': library sizes should be finite and non-negative

Thanks again

Hi @carolinemorrell

Can you show me how your pb object looks like? pb SummarizedExperiment::assay(pb, celltype)

Can you still try to run contrast = eval(parse(text=paste("limma::makeContrasts(", contrasts, ",levels=design)",sep=""))) res = muscat::pbDS(pb, method = "edgeR" , design = design, contrast = contrast, min_cells = 10, verbose = TRUE, filter = "none") to see if that works or not?

Hi @browaeysrobin

So this is my pb object

> pb
class: SingleCellExperiment 
dim: 43450 50 
metadata(2): experiment_info agg_pars
assays(14): B.cells CD4..T.cells ... T.cells Vascular.Endothelium
rownames(43450): RP11.34P13.3 FAM138A ... MIR4691 SNORD115.39
rowData names(0):
colnames(50): HN01 HN02 ... OP8 OP9
colData names(4): dataset patient seq Lymph.node
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

> SummarizedExperiment::assay(pb, celltype)
                        pt1 pt2  pt3   pt4  pt5  pt6  pt7  pt8  pt9  pt10  pt11   pt12  pt13  pt14  pt15
RP11.34P13.3               0    0     0      0     0     0     0     0     0     0     0      0     0     0     0
FAM138A                    0    0     0      0     0     0     0     0     0     0     0      0     0     0     0
OR4F5                      0    0     0      0     0     0     0     0     0     0     0      0     0     0     0

Then this is the code you asked me to run and the subsequent error

> contrast = eval(parse(text=paste("limma::makeContrasts(", contrasts, ",levels=design)",sep="")))
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'expr' in selecting a method for function 'eval': cannot coerce type 'closure' to vector of type 'character'
> res = muscat::pbDS(pb, method = "edgeR" , design = design, contrast = contrast, min_cells = 10, verbose = TRUE, filter = "none")
Error in as.list.environment(environment()) : object 'contrast' not found

Thanks so much for all your patience and help with this Caroline

Hi @carolinemorrell

I see what's wrong with your pb object. Normally, the colnames after running SummarizedExperiment::assay(pb, celltype) should be the same as the colnames of your pb object. In your case, they differ (pt1, pt2, ...) versus (HN01, HN02, ...). I guess that the colnames of the pb object are what you want (HN01, HN02, ...) and accord to the sample_id identifier that you defined.

Can you tell where the colnames after running SummarizedExperiment::assay(pb, celltype) (pt1, pt2, ...) are coming from? Is this an other identifier you provided, metadata column, ... It will be crucial for you to find this out since I have never seen this before and don't know how this may be caused.

Hi @browaeysrobin I changed these by hand before posting the output I'm afraid as I don't want to give away too much of what I am doing. The names are HN01 etc when I run the {R]SummarizedExperiment::assay(pb, celltype) code, so I don't think this is the issue. sorry!

Hi @carolinemorrell

Can you run the following blocks of code in a new R session?

lr_network = readRDS(url("https://zenodo.org/record/7074291/files/lr_network_human_21122021.rds"))
lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor) %>% mutate(ligand = make.names(ligand), receptor = make.names(receptor))

abundance_expression_info = get_abundance_expression_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, min_cells = min_cells, senders_oi = senders_oi, receivers_oi = receivers_oi, lr_network = lr_network, batches = batches)
abundance_expression_info$abund_plot_sample

Please show me the plot that is returned (you can change the celltype/sample labels if you want)

After that, please run:

celltypes = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
assay_oi_pb = "counts"
fun_oi_pb = "sum"
de_method_oi = "edgeR"
min_cells = 10
filterByExpr.min.count = 7
filterByExpr.min.total.count = 15
filterByExpr.large.n = 4
filterByExpr.min.prop = 0.7

DE_list = celltypes %>% lapply(function(celltype_oi, sce){
  sce_oi = sce[, SummarizedExperiment::colData(sce)[,celltype_id] == celltype_oi]
  DE_result = tryCatch(
    {perform_muscat_de_analysis(sce = sce_oi, 
                                sample_id = sample_id, 
                                celltype_id = celltype_id, 
                                group_id = group_id, 
                                batches = batches, 
                                covariates = covariates, 
                                contrasts = contrasts_oi, 
                                assay_oi_pb = assay_oi_pb, 
                                fun_oi_pb = fun_oi_pb, 
                                de_method_oi = de_method_oi, 
                                min_cells = min_cells, 
                                filterByExpr.min.count = filterByExpr.min.count, 
                                filterByExpr.min.total.count = filterByExpr.min.total.count, 
                                filterByExpr.large.n = filterByExpr.large.n, 
                                filterByExpr.min.prop = filterByExpr.min.prop)
    }, 
    error = function(cond){
      message(paste0("perform_muscat_de_analysis errored for celltype: ", celltype_oi))
      message("Here's the original error message:")
      message(cond)
      message("")
      print(cond)
      message(paste0("perform_muscat_de_analysis errored for celltype: ", celltype_oi))
      message("")
      print("In case: Error in x[[1]]: subscript out of bounds: this likely means that there are not enough samples per group with sufficient cells of this cell type. This cell type will thus be ignored for further analyses, other cell types will still be considered.")
      return(NA) # occurs when not enough samples per group with sufficient cells in most cases, can also be due to other error messages
    })
}, sce)

This should normally give the same error as you started with (!is.null(pbs[["group_id"]]) is not TRUE)

Then finally, run this:

pb_list = celltypes %>% lapply(function(celltype_oi, sce){
  sce_trial = sce[, SummarizedExperiment::colData(sce)[,celltype_id] == celltype_oi]
  sce_trial$id = sce_trial[[sample_id]]
  sce_trial = muscat::prepSCE(sce_trial,
                              kid = celltype_id, # subpopulation assignments
                              gid = group_id,  # group IDs (ctrl/stim)
                              sid = "id",   # sample IDs (ctrl/stim.1234)
                              drop = FALSE)  # drop all other SummarizedExperiment::colData columns ----------------- change to false

  print(table(sce_trial$cluster_id, sce_trial$sample_id))
  print(table(sce_trial$cluster_id, sce_trial$group_id))
  print(table(sce_trial$sample_id, sce_trial$group_id))
  print((sce_trial$cluster_id) %>% sort() %>% head())
  print((sce_trial$sample_id) %>% sort() %>% head())
  print((sce_trial$group_id) %>% sort() %>% head())

  print(!is.null(sce_trial[["cluster_id"]]))
  print(!is.null(sce_trial[["sample_id"]]))
  print(!is.null(sce_trial[["group_id"]]))

  # perform pseudobulking
  pb = muscat::aggregateData(sce_trial,
                             assay = "counts", fun = "sum",
                             by = c("cluster_id", "sample_id"))
  print(SummarizedExperiment::colData(pb)[1:5, 1:2])
  print(!is.null(pb[["group_id"]]))
  print(colSums(SummarizedExperiment::assay(pb)))

  # prepare design matrix
  extra_metadata = SummarizedExperiment::colData(sce_trial) %>% tibble::as_tibble() %>% dplyr::select(all_of(sample_id)) %>% dplyr::distinct() %>% dplyr::mutate_all(factor) 
  if('sample_id' != sample_id){
    extra_metadata$sample_id = extra_metadata[[sample_id]]
  }
  ei = S4Vectors::metadata(sce_trial)$experiment_info
  ei = ei %>%  dplyr::inner_join(extra_metadata, by = "sample_id")
  design = eval(parse(text=paste("model.matrix(~ 0 + ei$group_id) ",sep="")))
  dimnames(design) = list(ei$sample_id, c(levels(ei$group_id)))

  # filter out insufficient expressed genes  
  celltype = SummarizedExperiment::assayNames(pb) %>% .[1]
  pb2 = pb[edgeR::filterByExpr(y = SummarizedExperiment::assay(pb, celltype), design = design, min.count = 7, min.total.count = 15, large.n = 4, min.prop = 0.7), ]

  print(SummarizedExperiment::colData(pb2)[1:5, 1:2])
  print(!is.null(pb2[["group_id"]]))
  print(colSums(SummarizedExperiment::assay(pb2)))

  contrast = eval(parse(text=paste("limma::makeContrasts(", contrasts_oi, ",levels=design)", sep="")))
  res = muscat::pbDS(pb, method = "edgeR" , design = design, contrast = contrast, min_cells = 10, verbose = TRUE, filter = "none")
  res2 = muscat::pbDS(pb2, method = "edgeR" , design = design, contrast = contrast, min_cells = 10, verbose = TRUE, filter = "none")
  print(res$table[[1]][[1]] %>% head())
  print(res2$table[[1]][[1]] %>% head())

}, sce)

If you can give me the output of all this, I could start having a look at what may be different between your object and mine.

In your sessionInfo(), I also see that your R version is a bit outdated. I don't think the error results from that, but if possible for you: could you update your R and multinichenetr and its dependencies to make sure whether the error still errors in the newest version? If it still errors in the new version: can you try making your sce object as follows: sce =Seurat::as.SingleCellExperiment(DietSeurat(seurat_obj)) and see whether you get the same error with that object?

Thanks, I've emailed you my output, I hope thats OK. Caroline

Hi @browaeysrobin , I've solved the problem so will close this. In case it happens to anyone else, I had a patient that had a sample in two of my groups, and had the same naming convention in both. Once I changed it to e.g. pt1.healthy and pt1.diseased, it worked. Thanks for all your help, and a great package. Caroline

Hi @carolinemorrell

I've found and reproduced your error in another dataset. In your data and analysis input settings, one of the samples belonged to more than one condition. This gives errors in the muscat-based pseudobulk generation and DE analysis. Each sample should be uniquely assigned to one condition/group of interest. See the vignettes about paired and multifactorial analysis to see how to define your analysis input when you have multiple samples and conditions per patient. Making a new sample_id column n your SCE where you paste your current sample_id and group_id together should solve this problem. For example, if you have a tumor and normal tissue sample of patient1, your sample_id should not be patient1 but patient1_tumor or patient1_normal.

I now added input checks to the software to check this and give an understandable error message. Therefore I will close this issue.

saeyslab / multinichenetr

Error !is.null(pbs[["group_id"]]) is not TRUE #16