satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 908 forks source link

Invalid first argument error in CellCycleScoring #1831

Closed massonix closed 5 years ago

massonix commented 5 years ago

When I run the CellCycleScoring function I encounter the following error:

Error in sample.int(length(x), size, replace, prob) : invalid first argument

I have tried to only use the genes of the signatures that are in the rownames of the scale.data slot like so:

s_genes <- cc.genes$s.genes[cc.genes$s.genes %in% rownames(cll_seu@assays$RNA@scale.data)] g2m_genes <- cc.genes$g2m.genes[cc.genes$g2m.genes %in% rownames(cll_seu@assays$RNA@scale.data)] cll_seu <- CellCycleScoring(object = cll_seu, s.features = s_genes, g2m.features = g2m_genes)

However, I still get the same error. It seems to be a problem of AddModuleScore, because I have the same problem using this function. Do you know of any solutions?

Thanks a lot in advance!

R

andrewwbutler commented 5 years ago

Hi,

Can you verify that after restricting the genes to those in your object, that there are still some genes left (i.e. s_genes and g2m_genes are not empty)? If that's not the issue, is it possible for you to provide a dataset that reproduces the issue?

massonix commented 5 years ago

Dear Andrew

Thanks for your help. I am encountering the same error when using AddModuleScore, this time with another data set. Here is the Seurat object:

https://drive.google.com/open?id=1tRmFEG2DI1t15AwYlmnCW_k_s1yjPM2Q

And here is the code:

all(cold_shock_signature_sub %in% rownames(cll_integrated[["integrated"]]@scale.data)) [1] TRUE cll_integrated <- AddModuleScore( object = cll_integrated, features = cold_shock_signature_sub, name = "cold_shock_score" ) Error in sample.int(length(x), size, replace, prob) : invalid first argument

Thanks a lot,

Ramon

andrewwbutler commented 5 years ago

Hi Ramon,

I think the issue is caused by the fact that you're working with an integrated assay in which you only have 2000 features. The default parameters for AddModuleScore (nbin=24 and ctrl=100), won't work for such a reduced feature set. I would recommend trying either switching to the RNA assay and computing the scores based on that or rerunning the integration workflow and setting the features.to.integrate parameter to include all features (rather than just the 2000 that were used to find the anchors).

El-Castor commented 3 years ago

Hi Andrew,

I allow myself to reopen the post. I have the same error using the CellCycleScoring() function on my RNA-seq sc dataset.

> data.fem.ccTest <- CellCycleScoring(data.fem.ccTest, s.features = as.vector(s.genes$GM2_geneID), g2m.features = as.vector(g2m.genes$GM2_geneID), set.ident = TRUE)
Error in sample.int(x, size, replace, prob) : 
  impossible de prendre un échantillon plus grand que la population lorsque 'replace = FALSE'

My Seurat object is not integrated at the moment. This is only a dataset (I wanted to test the impact of cell cycle heterogeneity on my dataset). The only thing is that it is reduce considering the cellcycle marker only. Should I do it on the whole data set ?

> data.fem.ccTest
An object of class Seurat 
44 features across 5373 samples within 1 assay 
Active assay: RNA (44 features, 44 variable features)
 1 dimensional reduction calculated: pca

I am waiting your suggestions, thanks in advance.

Clément

andrewwbutler commented 3 years ago

Yes, I would advise running this on the whole dataset, not a subset containing only cell cycle markers.

kbrulois commented 3 years ago

I encountered this error when there were NA values in the gene expression data I wanted to compute mod scores on. Converting the NAs to zeros fixed the issue in my case.