neurorestore / Libra

MIT License
150 stars 24 forks source link

Error when no cells present in one condition cell type #43

Open mhvbyrne opened 1 year ago

mhvbyrne commented 1 year ago

Hi,

I am new to differential expression. I am analysing a single cell dataset with three different conditions and ten cell types. I receive the following error as one cell type in one condition does not have any cells. The error seems to be with the edgeR package as I am able to get it to work when I change it to DESeq2 - however, then it overwrites the cell type to everything as 'nonPT epithelia'.

Error in `tbl_at_vars()`:
! Can't subset columns that don't exist.
✖ Column `replicate` doesn't exist.

If I subset that cell type out I then get the error, even though the cell type is in my metadata:

Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `cell_type` is not found.

I have tried changing 'compartment' to 'cell_type', however I still receive the error.

# Packages
library(Seurat)
library(Libra)

# Filter to remove low gene counts and doublets
## pre filter to remove doublets
sobj <- subset(sobj, cells = which(sobj@meta.data$doublet_scores < 0.6))
dim(sobj)

## There were no nonPT epithelial cells in one condition so I had to remove this
sobj <- subset(sobj, subset = compartment != "nonPT epithelia")

# Extract raw counts and metadata
counts <- sobj@assays$RNA@counts
metadata <- sobj@meta.data

#### LIBRA ####
# store pseudobulk
matrices = to_pseudobulk(input = counts,
                         meta = metadata,
                         replicate_col = "cart_patient_id", #as we have multiple samples from the same patient
                         cell_type_col = "compartment", #the cell type
                         label_col = "tissue_loc",
                         min_features = 5
                         )

saveRDS(matrices, "./psuedobulk.rds")

# Perform differential expression
DE <- run_de(input = counts,
             meta = metadata,
             replicate_col = "cart_patient_id", #as we have multiple samples from the same patient
             cell_type_col = "compartment", #the cell type
             label_col = "tissue_loc", # what you want to compare
             min_features = 5, #minimum number of counts for inclusion of a feature
             de_family = "pseudobulk", # need to do this as we are using single cell data
             de_method = "edgeR", # can change this to whatever is needed
             de_type = "LRT" # stats method
             )

saveRDS(DE, "./differential_expression.rds")
yecotoo commented 2 months ago

你好

我是差分表达的新手。我正在分析具有三种不同条件和十种细胞类型的单个细胞数据集。我收到以下错误,因为在一个条件下的一个单元格类型没有任何单元格。该错误似乎与 edgeR 包有关,因为当我将其更改为 DESeq2 时,我能够让它工作 - 但是,它会将细胞类型覆盖为“nonPT epithelia”的所有内容。

Error in `tbl_at_vars()`:
! Can't subset columns that don't exist.
✖ Column `replicate` doesn't exist.

如果我将该单元格类型子集化,则会收到错误,即使该单元格类型在我的元数据中也是如此:

Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `cell_type` is not found.

我尝试将“compartment”更改为“cell_type”,但我仍然收到错误。

# Packages
library(Seurat)
library(Libra)

# Filter to remove low gene counts and doublets
## pre filter to remove doublets
sobj <- subset(sobj, cells = which(sobj@meta.data$doublet_scores < 0.6))
dim(sobj)

## There were no nonPT epithelial cells in one condition so I had to remove this
sobj <- subset(sobj, subset = compartment != "nonPT epithelia")

# Extract raw counts and metadata
counts <- sobj@assays$RNA@counts
metadata <- sobj@meta.data

#### LIBRA ####
# store pseudobulk
matrices = to_pseudobulk(input = counts,
                         meta = metadata,
                         replicate_col = "cart_patient_id", #as we have multiple samples from the same patient
                         cell_type_col = "compartment", #the cell type
                         label_col = "tissue_loc",
                         min_features = 5
                         )

saveRDS(matrices, "./psuedobulk.rds")

# Perform differential expression
DE <- run_de(input = counts,
             meta = metadata,
             replicate_col = "cart_patient_id", #as we have multiple samples from the same patient
             cell_type_col = "compartment", #the cell type
             label_col = "tissue_loc", # what you want to compare
             min_features = 5, #minimum number of counts for inclusion of a feature
             de_family = "pseudobulk", # need to do this as we are using single cell data
             de_method = "edgeR", # can change this to whatever is needed
             de_type = "LRT" # stats method
             )

saveRDS(DE, "./differential_expression.rds")

excuse me, did you solve the error? How? I meet it too.