stemangiola / tidyseurat

Seurat meets tidyverse. The best of both worlds.
https://stemangiola.github.io/tidyseurat/
154 stars 12 forks source link

Slice can't be called on Seurat object directly #84

Closed RoganGrant closed 6 months ago

RoganGrant commented 6 months ago

First of all, thank you for one of my favorite packages. This has saved me a great deal of effort in analysis.

I am currently trying to downsample a Seurat/tidyseurat object from >100k cells to ~50k. Ideally I would like to keep a significant representation of each cell state, so I intended to group by cell_state and downsample using slice_sample()

However, I've noticed two behaviors which may or may not be intended. If I perform grouping explicitly using group_by(), a data frame is returned:

set.seed(12345)
seurat_obj_subset = seurat_obj %>% 
   group_by(cell_state) %>% 
   slice_sample(n = 3e3) %>% 
   ungroup()

tidyseurat says: A data frame is returned for independent data analysis.

This is in theory fine, but I would ideally like to keep this as a Seurat object instead.

If I slice using the .by argument, however, the function does not run:

set.seed(12345)
seurat_obj_subset = seurat_obj %>% 
  slice_sample(n = 3e3, .by = cell_state)

Error in switch(type, call = "prefix", control = , delim = , subset = "special", : EXPR must be a length 1 vector

I'm wondering if this is intended behavior or not. Possible that I am simply approaching this the wrong way.

Thank you!

stemangiola commented 6 months ago

@william-hutchison could you please have a look who did the slice_sample function and discuss about fixing this part? 🙏

wvictor14 commented 6 months ago

Hi,

There are two issues here,

  1. is grouping then applying slice_sample returns a tibble rather than a seurat-tibble object, which I think is unexpected behaviour, and
  2. the .by argument returns an error

For the second one you mentioned, use the argument by, which is for slice_ variants, instead of .by (which is only for slice). I cannot reproduce your specific error though, perhaps there is some namespace clashing in your environment.

### reproducible example
seurat_obj <- SeuratObject::pbmc_small
seurat_obj <- seurat_obj |> 
  mutate(cell_state = paste0('C', as.character(RNA_snn_res.1)))

### using `by`
seurat_obj %>% 
  slice_sample(n = 3e3, by = cell_state)

# A Seurat-tibble abstraction: 80 × 16
# Features=230 | Cells=80 | Active assay=RNA | Assays=RNA
   .cell       orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
   <chr>       <fct>           <dbl>        <int> <fct>          
 1 ATGCCAGAAC… SeuratPro…         70           47 0              
 2 CATGGCCTGT… SeuratPro…         85           52 0              
 3 GAACCTGATG… SeuratPro…         87           50 1              
 4 TGACTGGATT… SeuratPro…        127           56 0              
 5 AGTCAGACTG… SeuratPro…        173           53 0              
 6 TCTGATACAC… SeuratPro…         70           48 0              
 7 TGGTATCTAA… SeuratPro…         64           36 0              
 8 GCAGCTCTGT… SeuratPro…         72           45 0              
 9 GATATAACAC… SeuratPro…         52           36 0              
10 AATGTTGACA… SeuratPro…        100           41 0              
# ℹ 70 more rows
# ℹ 11 more variables: letter.idents <fct>, groups <chr>,
#   RNA_snn_res.1 <fct>, cell_state <chr>, PC_1 <dbl>,
#   PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>,
#   tSNE_1 <dbl>, tSNE_2 <dbl>
# ℹ Use `print(n = ...)` to see more rows

### don't use `.by`:
seurat_obj %>% 
     slice_sample(n = 3e3, .by = cell_state)
Error in `slice_sample()`:
! Can't specify an argument named `.by` in this verb.
ℹ Did you mean to use `by` instead?
Run `rlang::last_trace()` to see where the error occurred.

Let me look into the first issue

wvictor14 commented 6 months ago

Actually I see that the behaviour for group_by is to return a grouped tibble rather than a "grouped Seurat object":

https://github.com/stemangiola/tidyseurat/blob/3b25110db466c686f52c1c58137ad147b57a20fe/R/dplyr_methods.R#L168-L185

RoganGrant commented 6 months ago

Ah I apologize; I found the issue on my end. It's an older environment that I've been using for a specific project, and I had dplyr pinned at 1.0.7. Updating to tidyverse 2.0.0 (dplyr 1.1.4) fixed issue 2. It would still be great to be able to group without converting to tibble, but that's perhaps a different issue.

In any case, you may want to update the requirements to specify dplyr 1.1.4.

Thanks for your help!