Closed r-melo closed 9 months ago
Hi,
Hope you're getting some use from it.
SCPA randomly selects 500 cells per population to run its analysis on when you give it your populations. It looks like your error is coming from that stage. There's no minimum number of cells, but when you reduce the cell numbers < below ~50 then you'll start to lose quite a bit of power in the testing.
It's difficult to tell from what you've sent - can you send me the code you're using to get to this point i.e. getting to your Seurat object, generating pathways, and the compare_seurat()
function. Or something I can reproduce on my end?
Just off the top of my head -- it seems unlikely this is happening but the error could be caused by having a negative integer or non-integer in your downsample argument to compare_seurat()
Jack
Hello,
Yes, I used the package successfully in my single-cell data, and am very satisfied with the results.
Here is my code
library(SCPA)
library(msigdbr)
library(Seurat)
library(tidyverse)
library(rio)
library(ggplot2)
library(ggrepel)
library(stringr)
st_obj <- readRDS('ST_020524.RDS')
path_gobp <- msigdbr('Homo sapiens','C5','GO:BP') %>% format_pathways()
path_kegg <- msigdbr('Homo sapiens','C2','CP:KEGG') %>% format_pathways()
comparisons <- list(ec = c(1,20,18),
pod = c(15,11,3,14,19))
path_list <- list()
for (comp in names(comparisons)){
scpa <- compare_seurat(st_obj,
group1 = 'seurat_clusters',
group1_population = comparisons[[comp]],
pathways = path_gobp,
assay = 'SCT',
downsample = 500)
path_list[[paste0(comp,'_gobp')]] <- scpa
scpa <- compare_seurat(st_obj,
group1 = 'seurat_clusters',
group1_population = comparisons[[comp]],
pathways = path_kegg,
assay = 'SCT',
downsample = 500)
path_list[[paste0(comp,'_kegg')]] <- scpa
}
I created a loop for the comparisons, but it fails on the first one. Here is the output:
Extracting cells where seurat_clusters == 1
Extracting data from the SCT assay
Extracting cells where seurat_clusters == 20
Extracting data from the SCT assay
Extracting cells where seurat_clusters == 18
Extracting data from the SCT assay
Using single core processing. Specify 'parallel = TRUE' and `cores = x` arguments for parallel processing
Cell numbers in population 1 = 144
Cell numbers in population 2 = NULL
Cell numbers in population 3 = NULL
Cell numbers in population 4 = NULL
Cell numbers in population 5 = NULL
Cell numbers in population 6 = NULL
Cell numbers in population 7 = NULL
Cell numbers in population 8 = NULL
Cell numbers in population 9 = NULL
Cell numbers in population 10 = NULL
Cell numbers in population 11 = NULL
Cell numbers in population 12 = NULL
Cell numbers in population 13 = NULL
Cell numbers in population 14 = NULL
Cell numbers in population 15 = NULL
Cell numbers in population 16 = NULL
Cell numbers in population 17 = NULL
Cell numbers in population 18 = 60
Cell numbers in population 19 = NULL
Cell numbers in population 20 = 26
- If greater than 500 cells, these populations will be downsampled
Error in sample.int(x, size, replace, prob) : invalid 'size' argument
In addition: There were 50 or more warnings (use warnings() to see the first 50)
I also checked
> traceback()
6: sample.int(x, size, replace, prob)
5: sample(ncol(df), n)
4: random_cells(samples[[i]], ifelse(cell_number[i] < downsample,
cell_number[i], downsample))
3: single_comparison(samples, pathways, downsample = downsample,
min_genes = min_genes, max_genes = max_genes)
2: compare_pathways(samples = samples, pathways = pathways, downsample = downsample,
min_genes = min_genes, max_genes = max_genes)
1: compare_seurat(st_neigh, group1 = "seurat_clusters", group1_population = comparisons[[comp]],
pathways = path_gobp, assay = "SCT", downsample = 500)
Do you have any insights?
Thanks!
Yeah, there's definitely in issue with the Cell numbers in population x = NULL
part.
Hmm -- I wonder if this is an issue with using numeric vectors in the group1_population = comparisons[[comp]]
part. Can you try defining your comparisons like this instead:
comparisons <- list(ec = c("1", "20", "18"),
pod = c("15", "11" , "3", "14", "19"))
And letting me know how that comes out?
Jack
That did the trick! Thank you so much! Your prompt response is very appreciated!
Here is how the output changed:
Extracting cells where seurat_clusters == 1
Extracting data from the SCT assay
Extracting cells where seurat_clusters == 20
Extracting data from the SCT assay
Extracting cells where seurat_clusters == 18
Extracting data from the SCT assay
Using single core processing. Specify 'parallel = TRUE' and `cores = x` arguments for parallel processing
Cell numbers in population 1 = 144
Cell numbers in population 2 = 26
Cell numbers in population 3 = 60
- If greater than 500 cells, these populations will be downsampled
Excluding 3596 pathway(s) based on min/max genes parameter: GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS, GOBP_2FE_2S_CLUSTER_ASSEMBLY, GOBP_3_PHOSPHOADENOSINE_5_PHOSPHOSULFATE_BIOSYNTHETIC_PROCESS, GOBP_5S_CLASS_RRNA_TRANSCRIPTION_BY_RNA_POLYMERASE_III, GOBP_5_PHOSPHORIBOSE_1_DIPHOSPHATE_METABOLIC_PROCESS...
Fab -- glad that's figured out.
Just FYI, I've released an updated SCPA version (v1.6.1) that should be compatible with using numeric vectors in compare_seurat()
, so you don't have to supply these as a character vector.
Hello,
Thank you for publishing and maintaining this package. I think the way it clusters single-cell data to uncover important pathways is very clever.
I am trying to apply this method to Visium spatial transcriptomics data, between regions with few spots. The function compare_seurat returns
I have no idea where to start debugging this. Is there a minimum number of cells / spots for the pipeline?
Thank you!