satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.3k stars 917 forks source link

AggregateExpression keeps returning not enough cells #8051

Closed wfaalajr closed 11 months ago

wfaalajr commented 12 months ago

Hi! I am following the vignette about DE analysis by pseudobulking in the Seurat manual. However, when I run the FindMarkers() function, it returns the error:

Error in ValidateCellGroups(object = object, cells.1 = cells.1, cells.2 = cells.2, : Cell group 2 has fewer than 3 cells

I checked the number of cells in the ident.1 and ident.2 that I have set, and I can confirm that in the original object there are more than three cells. But when I run AggregateExpression(), add the metadata as instructed by the manual, and I run:

table(pseudobulk_object$condition)

It returns a value of 1 per condition, which explains why FindMarkers() returns the error.

Has anyone else encountered this? Is it possible to explain this function more in the manual?

mhkowalski commented 12 months ago

Hi,

As we mention in the differential expression vignette, our pseudobulk analysis treats a celltype from the same sample as an observation. For example, we perform differential expression comparing all the CD14 monocytes from the stimulated condition compared to the not stimulated condition, where each donor's CD14 monocytes in a particular condition represent a single observation. We have 8 donors in that example, so we have 8 observations for each condition.

I'm not sure what conditions you have in your data, but for example, if you only have one donor, then it will not be possible to use a pseudobulk approach.

mhkowalski commented 11 months ago

Closing this issue, please open a new issue if you have further questions.

antoine4ucsd commented 10 months ago

hello if I may, I think my question is related. if I want to do a pseudobulk on a seurat object obtained from an integration of multiple participanta/samples , then the following

pseudo_ifnb <- AggregateExpression(seurat, assays = "RNA", return.seurat = T, group.by = c("celltype, "pid", "status"))
pseudo_ifnb$celltype.stim <- paste(pseudo_ifnb$celltype, pseudo_ifnb$status, sep = "_")
pseudobulk.MG.de <- FindMarkers(object = pseudo_ifnb, 
                                assay = "RNA",
                         ident.1 = "Microglial cells_suppressed", 
                         ident.2 = "Microglial cells_viremic",
                         test.use = "DESeq2")

is giving me an error .

Error in ValidateCellGroups(object = object, cells.1 = cells.1, cells.2 = cells.2,  : 
  Cell group 2 has fewer than 3 cells

I think it has do do with the AggregateExpression since data integration of merged samples lead to a renaming of the cells with the pids in the prefix.

for example my data looks like that

                        orig.ident nCount_RNA nFeature_RNA  pid
PID1_AAACCCACACGGTGCT-1   PID1_Blood       6728         2848 PID1
PID1_AAACCCACATAACTCG-1   PID1_Blood      20547         5957 PID1
PID1_AAACCCACATCTTTCA-1   PID1_Blood       1867         1130 PID1
PID1_AAACCCAGTCGAGATG-1   PID1_Blood       5656         2318 PID1
PID1_AAACGAACATAAGCGG-1   PID1_Blood       8177         3685 PID1

so the aggregation is not working since each row id includes the sample id...

is there a workaround when working on combined /integrated objects?

thank you!

wfaalajr commented 10 months ago

@antoine4ucsd the most recent version, v.5.0.1 fixed it for me. also, as mhkowalski mentioned, ensure that you have at least two samples in one group.

antoine4ucsd commented 10 months ago

thank you for the input, really appreciated. I have 2 samples in both groups but I still have the same error

Error in ValidateCellGroups(object = object, cells.1 = cells.1, cells.2 = cells.2,  : 
  Cell group 1 has fewer than 3 cells

this does not happen without pseudobulk. I do have seurat 5.0.1 install I was wondering if there would be a way for renaming the rownames

thank you

wfaalajr commented 10 months ago

can you check the result of: counts <- as.matrix(pseudobulk_obj[[“RNA”]]$counts) colnames(counts)

and

metadata <- as.data.frame(pseudobulk_obj@meta.data)

if the colnames in the counts are not in the format that you supplied in the AggregateExpression part, try supplying it in the order as specified in the vignette: stim, sample, celltype.

by rownames, you mean the barcode ids? i also suspected it, but that didn’t resolve it for me. i think scCustomize has a function for this, but i’m basing it from memory so i might be wrong.

mhkowalski commented 7 months ago

Apologies for the delayed response, but in case someone runs into this issue again, you can reduce the minimum requirement for the number of "cells" in a group by changing min.cells.group = 2 to your FindMarkers() call. This would allow you to do pseudobulk analysis where you have 2 replicates per condition.