satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 907 forks source link

error running RunPCA #5395

Closed jjacob12 closed 2 years ago

jjacob12 commented 2 years ago

Hi there, Looks like this issue has been encountered before (RunPCA error #1788), but doesn't look like there was a definitive solution. Also, the datasets associated with the issue were not provided. I am new to Seurat (and writing code/R!). Running Seurat v4 on RStudio-Server/1.4.1717-foss-2021a-Java-11-R-4.1.0, using R-bundle-Bioconductor/3.13-foss-2021a-R-4.1.0.

I'm analysing a public snRNA-seq dataset ((https://cbl-dev.cells.ucsc.edu) of around 70,000 cells/nuclei. I used code from, https://cellbrowser.readthedocs.io/en/master/load.html to create the Seurat object, running the first 9 lines of code under the subheading, 'Seurat'. The Seurat object is designated 'so'.

The libraries/options I loaded were: library(tidyverse) library(Seurat) library(data.table) library(patchwork) options(bitmapType='cairo-png') # this line needed to generate plots

I have used the code for downstream analysis from the vignette, https://satijalab.org/seurat/articles/pbmc3k_tutorial.html All worked fine until I ran the 'RunPCA' command. Up to that point I ran: so <- NormalizeData(so, normalization.method = "LogNormalize", scale.factor = 10000) so <- ScaleData(so, features = rownames(so)) so <- FindVariableFeatures(so, selection.method = "vst", nfeatures = 2000) so <- RunPCA(so, features = VariableFeatures(object = so))

With the last line I got the error message that others have had before: "Error in irlba(A = t(x = object), nv = npcs, ...) : max(nu, nv) must be positive In addition: Warning message: In PrepDR(object = object, features = features, verbose = verbose) : The following 2000 features requested have zero variance (running reduction without them):..."

To see if I could progress the analysis nevertheless, I ran: DimPlot(pbmc, reduction = "pca")

So I am stuck. Can you please look into what's causing this issue and recommend a fix? Many thanks.

liu-xingliang commented 2 years ago

I think what RunPCA is complaining is about the rank of the input matrix is not positive, which is the nature of this specific input. I am not sure why the expression matrix seems truncated for some reason with only 2000 genes but over 69k cells.

samuel-marsh commented 2 years ago

Hi Jacob,

Not member of dev team but hopefully can be helpful. So it appears that are couple of issues with this dataset and I would suggest contacting the authors to let them know of the issues.

Yes as @liuxl18-hku suggests I think thats the issue. First, the exprMatrix.tsv.gz file for this dataset contains 69175 cells but only 2000 genes. So inherently there is gonna be issue with calculating 2000 variable genes when there are only 2000 genes present.

Second, the values appear to already be normalized values (they are decimal values not integers). Therefore you don't want to run NormalizeData again.

I would suggest downloading the seurat object on that page and you can always remove the analyzed assays present and start over if you like. This appears to have all the features.

Best, Sam

samuel-marsh commented 2 years ago

You will likely want to update the object as well since it was created with 3.0.2

test <- read_rds("~/Downloads/seurat.rds")
test@version
[1] ‘3.0.2’
samuel-marsh commented 2 years ago

On final inspection there is something weird with the Seurat object too.

If you simply try to update the object you get:

test <- read_rds("~/Downloads/seurat.rds")
test2 <- UpdateSeuratObject(test)
Validating object structure
Updating object slots
Error: All cells in the object being added must match the cells in this object

Not sure why that error is occurring but I think there is probably something weird in one of the processed assays. I say this because it is possible to update if you remove SCT and integrated assays it works to update.

So if you want to start with just the raw data and analyze yourself you can run:

test <- read_rds("~/Downloads/seurat.rds")
DefaultAssay(test) <- "RNA"
test <- DietSeurat(test, assays = "RNA")
test2 <- UpdateSeuratObject(test)

Best, Sam

jjacob12 commented 2 years ago

Many thanks Xingliang and Sam. So if I understand correctly I should:

At what point do I use the above code you provided? I'm not familiar with the read_rds command so do I simply copy and paste this code into RStudio server? I'm a little confused about the PATH to 'seurat.rds'. Do I use the PATH "~/Downloads/seurat.rds" exactly as you have written? Apologies in advance if this is a basic question.

samuel-marsh commented 2 years ago

@jjacob12 No don't create object using the code and exprMatrix.tsv.gz. The data in that expression matrix is incomplete and only contains data for 2000 genes (instead of 46,000).

You should only use that second part of the code above to import the object correctly, then clean it, and update it, so that you can start over fresh (if that is your intention).

read_rds is part of readr package but you can substitute readRDS which is base R function.

Yes the path should just be your local path to the file. That in the above code that is just example using my path for testing this.

Best, Sam

jjacob12 commented 2 years ago

Hi Sam,

Using 'seurat.rds' as the file has led to some new problems. Before getting in touch with you, this is the code I ran (same as the PBMC-3K vignette)

mat <- fread("data/exprMatrix.tsv.gz", stringsAsFactors = FALSE) meta <- read.table("data/meta.tsv", header = TRUE, sep = "\t", as.is=T, row.names=1) genes = mat[,1][[1]] genes = gsub(".+[|]", "", genes)

mat = data.frame(mat[,-1], row.names=genes) so <- CreateSeuratObject(counts = mat, project = "aldinger2020", meta.data=meta) so[["percent.mt"]]<- PercentageFeatureSet(so, pattern = "^MT-")

if pt.size is not set to zero, there is massive overplotting because

of the very large

number of cells, and the violin plot itself cannot be seen

VlnPlot(so, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0)

Using seurat.rds, the code is as shown below: so <- read_rds("data/seurat.rds") DefaultAssay(so) <- "RNA" so <- DietSeurat(so, assays = "RNA") so2 <- UpdateSeuratObject(so) so2[["percent.mt"]] <- PercentageFeatureSet(so2, pattern = "^MT-") VlnPlot(so2, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) This results in the weird plot that is attached. I then added the 'pt.size=0' option to the VlnPlot command and then got a plot for the 'features' on a per cell type basis. I wonder if this results from not attaching the meta.data ('meta' object)?? This is just a wild guess though.

Best, John

------ Original Message ------ From: "Samuel Marsh" @.> To: "satijalab/seurat" @.> Cc: "jjacob12" @.>; "Mention" @.> Sent: Thursday, 16 Dec, 2021 At 19:58 Subject: Re: [satijalab/seurat] error running RunPCA (Issue #5395)

@jjacob12 https://github.com/jjacob12 No don't create object using the code and exprMatrix.tsv.gz. The data in that expression matrix is incomplete and only contains data for 2000 genes (instead of 46,000). You should only use that second part of the code above to import the object correctly, then clean it, and update it, so that you can start over fresh (if that is your intention). read_rds is part of readr package but you can substitute readRDS which is base R function. Yes the path should just be your local path to the file. That in the above code that is just example using my path for testing this. Best, Sam — Reply to this email directly, view it on GitHub https://github.com/satijalab/seurat/issues/5395#issuecomment-996150968 , or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFENHQPFWFR7LWT7TM3VTURJAFFANCNFSM5KGKN4JA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . You are receiving this because you were mentioned.Message ID: @.***>

samuel-marsh commented 2 years ago

Hi @jjacob12 I don't see any plot but I think I know what you are referring to.

In the object the current active.ident is maintained from the original object even though we've removed the data. All of the meta data is still contained in the object as you can see with:

View(so2@meta.data)

To change things back to plot by sample id you simply need to run Idents() as you would to anytime to change the active ident:

Idents(so2) <- "sample_id"

Best, Sam

jjacob12 commented 2 years ago

Hi Sam, So, what I get with the PBMCK vignette is this: Screenshot 2021-12-17 at 15 12 31

However, when I run: Indents(so2) <- "sample_id" so2[["percent.mt"]] <- PercentageFeatureSet(so2, pattern = "^MT-")

jjacob12 commented 2 years ago

Apologies, pressed the comment button too soon. So when I run: Indents(so2) <- "sample_id" so2[["percent.mt"]] <- PercentageFeatureSet(so2, pattern = "^MT-") VlnPlot(so2, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), pt.size = 0, ncol = 3) I get the below plot: Screenshot 2021-12-17 at 15 25 33

samuel-marsh commented 2 years ago

Hi @jjacob12 Yes that is correct. pbmc3k only has one sample in the dataset where as this public dataset has 15. So those are the expected plots.

Best, Sam

jjacob12 commented 2 years ago

Ah, should have realised that. Sorry for the dumb question. Best, John

samuel-marsh commented 2 years ago

Hey John,

No worries!! :)

jjacob12 commented 2 years ago

Thanks Sam!

yuhanH commented 2 years ago

Thanks @samuel-marsh ! It seems that this issue is resolved.

jjacob12 commented 2 years ago

Agree, many thanks once again. Best wishes John

Dr John Jacob

Sent from my iPhone

On 28 Jan 2022, at 17:24, Yuhan Hao @.***> wrote:

 Thanks @samuel-marsh ! It seems that this issue is resolved.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

rizwanahmad95 commented 2 years ago

Hi @jjacob12 I faced the same problem as you did but I got out of trouble with the help of a bioinformatician. Aapparently the PCA is absent in your seurat object. You might have missed to run ScaleData, RunPCA and RunUMAP on the integrated data. Following commands may help after you create your integrated object:

seu_int <- Seurat::ScaleData(seu_int) seu_int <- Seurat::RunPCA(seu_int, npcs = 30) seu_int <- Seurat::RunUMAP(seu_int, reduction = "pca", dims = 1:25)

samuel-marsh commented 2 years ago

Hi @rizwanahmad95 thanks for suggesting solution. However, in this case issue was with RunPCA giving an error and not allowing it to run. We did solve the issue which was due to the nature of the public data through the series of posts above.

Best, Sam