neurorestore / Libra

MIT License
153 stars 25 forks source link

Column `cell_type` is not found. #9

Closed fredust closed 2 years ago

fredust commented 2 years ago

Hello, thank you for developing this great package.

I have a problem while running my data:

Error in group_by(): ! Must group by variables found in .data. x Column cell_type is not found. Backtrace:

  1. Libra::run_de(...)
    1. dplyr:::group_by.data.frame(., cell_type)

The object input is a seurat object, the metadata like this: orig.ident nCount_RNA nFeature_RNA Sample tissue cell_type AAACCTGAGTACTTGC-1 SeuratProject 2954 1477 sample1 disease celltype_1 AAACCTGCACGAAATA-1 SeuratProject 4197 1568 sample1 disease celltype_2

Here is the script: DE<-Libra::run_de( seuratobjec, replicate_col = "Sample", label_col = "tissue", cell_type_col = "cell_type", min_cells = 3, min_reps = 2, min_features = 0, de_family = "pseudobulk", de_method = "edgeR", de_type = "LRT", n_threads = 2 )

Could you help me to solve the problem? I check the metadata in example data, it looks like similar to my data. Thank you very much

jordansquair commented 2 years ago

Do you get the same error if you remove the line: cell_type_col = "cell_type" ?

KammannT commented 2 years ago

I can confirm that the error persists after removal of this line of code.

deb0612 commented 2 years ago

I got the same error! I just simply run DE = run_de(D1) [1] "mDA" Error in group_by(): ! Must group by variables found in .data. x Column cell_type is not found.

and the output of meta.data: head(D1@meta.data) orig.ident nCount_RNA nFeature_RNA percent.mt RNA_snn_res.0.5 wt_7pd_AAGATAGTCTTAGCTT-1 WT7p 8496 3450 0.046093570 2 wt_7pd_ACATCGAAGAGGTCGT-1 WT7p 2673 1506 0.036603221 2 wt_7pd_ACTCTCGAGGTGGGTT-1 WT7p 9379 3540 0.020659023 2 wt_7pd_ATACTTCCATCAGCTA-1 WT7p 10700 3648 0.018286550 2 wt_7pd_CACCGTTCAAGCGAGT-1 WT7p 10703 3729 0.009092562 2 wt_7pd_CATCCCAAGTTTCGAC-1 WT7p 14162 4461 0.013791201 2 seurat_clusters replicate label cell_type wt_7pd_AAGATAGTCTTAGCTT-1 2 mouse1 WT7p mDA wt_7pd_ACATCGAAGAGGTCGT-1 2 mouse1 WT7p mDA wt_7pd_ACTCTCGAGGTGGGTT-1 2 mouse1 WT7p mDA wt_7pd_ATACTTCCATCAGCTA-1 2 mouse1 WT7p mDA wt_7pd_CACCGTTCAAGCGAGT-1 2 mouse1 WT7p mDA wt_7pd_CATCCCAAGTTTCGAC-1 2 mouse1 WT7p mDA

jordansquair commented 2 years ago

Pushed a fix - hopefully this works. Or else maybe you can send me a small sample of your data to figure out what is going on. Let me know!

rersister commented 2 years ago

I am so sad. I also got this problem . How to solve it?

rersister commented 2 years ago

I try tow dataset, with this code : expr = countexp.Seurat@assays[["RNA"]]@data meta = countexp.Seurat@meta.data meta$replicate = meta$orig.ident meta$cell_type = meta$ident meta$label = meta$seurat_clusters

DE = run_de(expr, meta = meta)

I both got error.

DE = run_de(expr, meta = meta) [1] "B" [1] "CD14+ Mono" [1] "CD8 T" [1] "DC" [1] "FCGR3A+ Mono" [1] "Memory CD4 T" [1] "Naive CD4 T" [1] "NK" [1] "Platelet" Error: Must group by variables found in .data.

It is so strange. I don't know how to solve it. Can you help me?

jordansquair commented 2 years ago

Which dataset is this?

rersister commented 2 years ago

The dataset is the sore of metabolism pathway. I want to use your method to find the differential metabolic pathways. I try this code "DE = run_de(expr, meta = meta)" in my dataset and a matrix extract from SeuratObject. Both of them got an error.

My expr :

expr[1:4,1:4] T10_AGAGCGAAGTTGAGTA.1 T10_AAAGTAGGTTAGAACA.1 T10_ACGAGCCCAATCGGTT.1 Glycolysis / Gluconeogenesis 0.55253021 0.15007870 0.8381050 Citrate cycle (TCA cycle) -0.14315762 -0.14315762 0.3982489 Pentose phosphate pathway 0.03335563 -0.07158204 0.3402192 Pentose and glucuronate interconversions 0.23573199 0.67372352 0.3347900 T10_CACACTCGTAAACCTC.1 Glycolysis / Gluconeogenesis -0.1654858 Citrate cycle (TCA cycle) -0.0190853 Pentose phosphate pathway 0.2568689 Pentose and glucuronate interconversions -0.0519955

meta orig.ident nCount_RNA nFeature_RNA celltype replicate cell_type label T10_AGAGCGAAGTTGAGTA.1 T10 1919 919 tumor 10 tumor T10 T10_AAAGTAGGTTAGAACA.1 T10 5242 1953 tumor 10 tumor T10 T10_ACGAGCCCAATCGGTT.1 T10 9517 2982 tumor 10 tumor T10 T10_CACACTCGTAAACCTC.1 T10 3133 1333 tumor 10 tumor T10.

Are there any problems?

LooLipin commented 2 years ago

Hi all, I'm having the same error when running my own data which is a seurat object.

By running rlang::last_error() to see where the error occurred, it seems the error is caused by a group_by function.

<error/rlang_error>
Error in `group_by()`:
! Must group by variables found in `.data`.
x Column `cell_type` is not found.
---
Backtrace:
  1. Libra::run_de(SC)
 10. dplyr:::group_by.data.frame(., cell_type)
Run `rlang::last_trace()` to see the full context.

It seems like it is failing at the dplyr:::group_by stage. Not sure what this means but hopefully it helps with troubleshooting.

I ran the hagai_toy data and it ran fine. I noticed it had only one cell type so I subsetted my data to have only one cell type but still ran into the same issue.

jordansquair commented 2 years ago

rersister -> do you only have one unique label? I don't think this would be the appropriate analysis you are looking for?

jordansquair commented 2 years ago

LooLipin -> hard to say without seeing your data - can you provide a sample? Just take your Seurat object and subset 50-100 random genes and email it to me (email on my profile).

jordansquair commented 2 years ago

LooLipin -> your data has 3 conditions. Libra is setup for pairwise comparisons. So you need to do each comparison in series, for example:

sc = readRDS("~/Downloads/SmallSC.rds")
Idents(sc) = sc$label
sc %<>% subset(idents = c('Sham', 'SNI'))
de = run_de(sc)
[1] "Gab8"
[1] "Oligo"
> str(de)
tibble [27 × 8] (S3: tbl_df/tbl/data.frame)
 $ cell_type: chr [1:27] "Oligo" "Oligo" "Oligo" "Oligo" ...
 $ gene     : chr [1:27] "Arfgef1" "Atp6v1h" "Cops5" "Cspp1" ...
 $ avg_logFC: num [1:27] 0.276 -2.068 0.016 0.838 2.082 ...
 $ p_val    : num [1:27] 0.759 0.245 0.989 0.559 0.238 ...
 $ p_val_adj: num [1:27] 0.969 0.514 0.99 0.788 0.514 ...
 $ de_family: chr [1:27] "pseudobulk" "pseudobulk" "pseudobulk" "pseudobulk" ...
 $ de_method: chr [1:27] "edgeR" "edgeR" "edgeR" "edgeR" ...
 $ de_type  : chr [1:27] "LRT" "LRT" "LRT" "LRT" ...
victorwang123 commented 2 years ago

I also met this problem, my input data is from seurat which contain 'cell_type' ,but it shows : Error: Must group by variables found in .data.

fredust commented 2 years ago

LooLipin -> your data has 3 conditions. Libra is setup for pairwise comparisons. So you need to do each comparison in series, for example:

sc = readRDS("~/Downloads/SmallSC.rds")
Idents(sc) = sc$label
sc %<>% subset(idents = c('Sham', 'SNI'))
de = run_de(sc)
[1] "Gab8"
[1] "Oligo"
> str(de)
tibble [27 × 8] (S3: tbl_df/tbl/data.frame)
 $ cell_type: chr [1:27] "Oligo" "Oligo" "Oligo" "Oligo" ...
 $ gene     : chr [1:27] "Arfgef1" "Atp6v1h" "Cops5" "Cspp1" ...
 $ avg_logFC: num [1:27] 0.276 -2.068 0.016 0.838 2.082 ...
 $ p_val    : num [1:27] 0.759 0.245 0.989 0.559 0.238 ...
 $ p_val_adj: num [1:27] 0.969 0.514 0.99 0.788 0.514 ...
 $ de_family: chr [1:27] "pseudobulk" "pseudobulk" "pseudobulk" "pseudobulk" ...
 $ de_method: chr [1:27] "edgeR" "edgeR" "edgeR" "edgeR" ...
 $ de_type  : chr [1:27] "LRT" "LRT" "LRT" "LRT" ...

Hello, sorry for the late response. I subset my data with 2 conditions and it works well. Thank you very much.

GuiSeSanz commented 1 year ago

Hi! I've run the code before with an extra sample for the Post label, but now I need this type of contrast. When running the DE, falls again in the same error...


> table(metadata$replicate, metadata$label)

               Diagnosed Post
  FS-0634-post         0 6924
  SMD34459          3135    0
  SMD35109          3314    0
  SMD35303          1608    0
  SMD37209          9029    0

> table(metadata$cell_type, metadata$label)

                 Diagnosed Post
  Basophil             398   32
  CLP                 1278 1279
  DendriticCell        310  430
  EarlyErythroid      1299  879
  GMP                  988  210
  Granulocyte          947   94
  HSC                 2923  330
  LMPP                1465  411
  LateErythroid       1792  473
  MEP                  777  211
  MK_Prog              422   74
  Monocytes           1333  280
  T                    412    0
  pro-B               2742 2221

>table(metadata$replicate, metadata$label)

               Diagnosed Post
  FS-0634-post         0 6924
  SMD34459          3135    0
  SMD35109          3314    0
  SMD35303          1608    0
  SMD37209          9029    0

> DE = run_de(as.matrix(all_data), meta = metadata,
+ replicate_col = "replicate",
+ cell_type_col = "cell_type",
+ label_col = "label")
[1] "Basophil"
[1] "CLP"
[1] "DendriticCell"
[1] "EarlyErythroid"
[1] "GMP"
[1] "Granulocyte"
[1] "HSC"
[1] "LMPP"
[1] "LateErythroid"
[1] "MEP"
[1] "MK_Prog"
[1] "Monocytes"
[1] "T"
[1] "pro-B"
Error in `group_by()`:
! Must group by variables found in `.data`.
x Column `cell_type` is not found.
Run `rlang::last_error()` to see where the error occurred.

What can be happening here?

jordansquair commented 1 year ago

You do not have replicates in one of your conditions. Pseudobulk methods are therefore not an option.