jackbibby1 / SCPA

R package for pathway analysis in scRNA-seq data
https://jackbibby1.github.io/SCPA/
GNU General Public License v3.0
62 stars 6 forks source link

Invalid size argument when comparing spots in Visium data #68

Closed r-melo closed 9 months ago

r-melo commented 9 months ago

Hello,

Thank you for publishing and maintaining this package. I think the way it clusters single-cell data to uncover important pathways is very clever.

I am trying to apply this method to Visium spatial transcriptomics data, between regions with few spots. The function compare_seurat returns

Error in sample.int(x, size, replace, prob) : invalid 'size' argument

I have no idea where to start debugging this. Is there a minimum number of cells / spots for the pipeline?

Thank you!

jackbibby1 commented 9 months ago

Hi,

Hope you're getting some use from it.

SCPA randomly selects 500 cells per population to run its analysis on when you give it your populations. It looks like your error is coming from that stage. There's no minimum number of cells, but when you reduce the cell numbers < below ~50 then you'll start to lose quite a bit of power in the testing.

It's difficult to tell from what you've sent - can you send me the code you're using to get to this point i.e. getting to your Seurat object, generating pathways, and the compare_seurat() function. Or something I can reproduce on my end?

Just off the top of my head -- it seems unlikely this is happening but the error could be caused by having a negative integer or non-integer in your downsample argument to compare_seurat()

Jack

r-melo commented 9 months ago

Hello,

Yes, I used the package successfully in my single-cell data, and am very satisfied with the results.

Here is my code

library(SCPA)
library(msigdbr)
library(Seurat)
library(tidyverse)
library(rio)
library(ggplot2)
library(ggrepel)
library(stringr)

st_obj <- readRDS('ST_020524.RDS')

path_gobp <- msigdbr('Homo sapiens','C5','GO:BP') %>% format_pathways()
path_kegg <- msigdbr('Homo sapiens','C2','CP:KEGG') %>% format_pathways()

comparisons <- list(ec =  c(1,20,18),
                    pod = c(15,11,3,14,19))

path_list <- list()
for (comp in names(comparisons)){
  scpa <- compare_seurat(st_obj,
                         group1 = 'seurat_clusters',
                         group1_population = comparisons[[comp]],
                         pathways = path_gobp,
                         assay = 'SCT',
                         downsample = 500)
  path_list[[paste0(comp,'_gobp')]] <- scpa

  scpa <- compare_seurat(st_obj,
                         group1 = 'seurat_clusters',
                         group1_population = comparisons[[comp]],
                         pathways = path_kegg,
                         assay = 'SCT',
                         downsample = 500)
  path_list[[paste0(comp,'_kegg')]] <- scpa
}

I created a loop for the comparisons, but it fails on the first one. Here is the output:

Extracting cells where seurat_clusters == 1
Extracting data from the SCT assay
Extracting cells where seurat_clusters == 20
Extracting data from the SCT assay
Extracting cells where seurat_clusters == 18
Extracting data from the SCT assay
Using single core processing. Specify 'parallel = TRUE' and `cores = x` arguments for parallel processing

Cell numbers in population 1 = 144
Cell numbers in population 2 = NULL
Cell numbers in population 3 = NULL
Cell numbers in population 4 = NULL
Cell numbers in population 5 = NULL
Cell numbers in population 6 = NULL
Cell numbers in population 7 = NULL
Cell numbers in population 8 = NULL
Cell numbers in population 9 = NULL
Cell numbers in population 10 = NULL
Cell numbers in population 11 = NULL
Cell numbers in population 12 = NULL
Cell numbers in population 13 = NULL
Cell numbers in population 14 = NULL
Cell numbers in population 15 = NULL
Cell numbers in population 16 = NULL
Cell numbers in population 17 = NULL
Cell numbers in population 18 = 60
Cell numbers in population 19 = NULL
Cell numbers in population 20 = 26
- If greater than 500 cells, these populations will be downsampled

Error in sample.int(x, size, replace, prob) : invalid 'size' argument
In addition: There were 50 or more warnings (use warnings() to see the first 50)

I also checked

> traceback()
6: sample.int(x, size, replace, prob)
5: sample(ncol(df), n)
4: random_cells(samples[[i]], ifelse(cell_number[i] < downsample, 
       cell_number[i], downsample))
3: single_comparison(samples, pathways, downsample = downsample, 
       min_genes = min_genes, max_genes = max_genes)
2: compare_pathways(samples = samples, pathways = pathways, downsample = downsample, 
       min_genes = min_genes, max_genes = max_genes)
1: compare_seurat(st_neigh, group1 = "seurat_clusters", group1_population = comparisons[[comp]], 
       pathways = path_gobp, assay = "SCT", downsample = 500)

Do you have any insights?

Thanks!

jackbibby1 commented 9 months ago

Yeah, there's definitely in issue with the Cell numbers in population x = NULL part.

Hmm -- I wonder if this is an issue with using numeric vectors in the group1_population = comparisons[[comp]] part. Can you try defining your comparisons like this instead:

comparisons <- list(ec =  c("1", "20", "18"),
                    pod = c("15", "11" , "3", "14", "19"))

And letting me know how that comes out?

Jack

r-melo commented 9 months ago

That did the trick! Thank you so much! Your prompt response is very appreciated!

Here is how the output changed:

Extracting cells where seurat_clusters == 1
Extracting data from the SCT assay
Extracting cells where seurat_clusters == 20
Extracting data from the SCT assay
Extracting cells where seurat_clusters == 18
Extracting data from the SCT assay
Using single core processing. Specify 'parallel = TRUE' and `cores = x` arguments for parallel processing

Cell numbers in population 1 = 144
Cell numbers in population 2 = 26
Cell numbers in population 3 = 60
- If greater than 500 cells, these populations will be downsampled

Excluding 3596 pathway(s) based on min/max genes parameter: GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS, GOBP_2FE_2S_CLUSTER_ASSEMBLY, GOBP_3_PHOSPHOADENOSINE_5_PHOSPHOSULFATE_BIOSYNTHETIC_PROCESS, GOBP_5S_CLASS_RRNA_TRANSCRIPTION_BY_RNA_POLYMERASE_III, GOBP_5_PHOSPHORIBOSE_1_DIPHOSPHATE_METABOLIC_PROCESS...
jackbibby1 commented 9 months ago

Fab -- glad that's figured out.

Just FYI, I've released an updated SCPA version (v1.6.1) that should be compatible with using numeric vectors in compare_seurat(), so you don't have to supply these as a character vector.