Closed jjacob12 closed 2 years ago
I think what RunPCA is complaining is about the rank of the input matrix is not positive, which is the nature of this specific input. I am not sure why the expression matrix seems truncated for some reason with only 2000 genes but over 69k cells.
Hi Jacob,
Not member of dev team but hopefully can be helpful. So it appears that are couple of issues with this dataset and I would suggest contacting the authors to let them know of the issues.
Yes as @liuxl18-hku suggests I think thats the issue. First, the exprMatrix.tsv.gz file for this dataset contains 69175 cells but only 2000 genes. So inherently there is gonna be issue with calculating 2000 variable genes when there are only 2000 genes present.
Second, the values appear to already be normalized values (they are decimal values not integers). Therefore you don't want to run NormalizeData again.
I would suggest downloading the seurat object on that page and you can always remove the analyzed assays present and start over if you like. This appears to have all the features.
Best, Sam
You will likely want to update the object as well since it was created with 3.0.2
test <- read_rds("~/Downloads/seurat.rds")
test@version
[1] ‘3.0.2’
On final inspection there is something weird with the Seurat object too.
If you simply try to update the object you get:
test <- read_rds("~/Downloads/seurat.rds")
test2 <- UpdateSeuratObject(test)
Validating object structure
Updating object slots
Error: All cells in the object being added must match the cells in this object
Not sure why that error is occurring but I think there is probably something weird in one of the processed assays. I say this because it is possible to update if you remove SCT and integrated assays it works to update.
So if you want to start with just the raw data and analyze yourself you can run:
test <- read_rds("~/Downloads/seurat.rds")
DefaultAssay(test) <- "RNA"
test <- DietSeurat(test, assays = "RNA")
test2 <- UpdateSeuratObject(test)
Best, Sam
Many thanks Xingliang and Sam. So if I understand correctly I should:
At what point do I use the above code you provided? I'm not familiar with the read_rds command so do I simply copy and paste this code into RStudio server? I'm a little confused about the PATH to 'seurat.rds'. Do I use the PATH "~/Downloads/seurat.rds" exactly as you have written? Apologies in advance if this is a basic question.
@jjacob12 No don't create object using the code and exprMatrix.tsv.gz. The data in that expression matrix is incomplete and only contains data for 2000 genes (instead of 46,000).
You should only use that second part of the code above to import the object correctly, then clean it, and update it, so that you can start over fresh (if that is your intention).
read_rds
is part of readr package but you can substitute readRDS
which is base R function.
Yes the path should just be your local path to the file. That in the above code that is just example using my path for testing this.
Best, Sam
Hi Sam,
Using 'seurat.rds' as the file has led to some new problems. Before getting in touch with you, this is the code I ran (same as the PBMC-3K vignette)
mat <- fread("data/exprMatrix.tsv.gz", stringsAsFactors = FALSE) meta <- read.table("data/meta.tsv", header = TRUE, sep = "\t", as.is=T, row.names=1) genes = mat[,1][[1]] genes = gsub(".+[|]", "", genes)
mat = data.frame(mat[,-1], row.names=genes) so <- CreateSeuratObject(counts = mat, project = "aldinger2020", meta.data=meta) so[["percent.mt"]]<- PercentageFeatureSet(so, pattern = "^MT-")
of the very large
VlnPlot(so, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0)
Using seurat.rds, the code is as shown below: so <- read_rds("data/seurat.rds") DefaultAssay(so) <- "RNA" so <- DietSeurat(so, assays = "RNA") so2 <- UpdateSeuratObject(so) so2[["percent.mt"]] <- PercentageFeatureSet(so2, pattern = "^MT-") VlnPlot(so2, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) This results in the weird plot that is attached. I then added the 'pt.size=0' option to the VlnPlot command and then got a plot for the 'features' on a per cell type basis. I wonder if this results from not attaching the meta.data ('meta' object)?? This is just a wild guess though.
Best, John
------ Original Message ------ From: "Samuel Marsh" @.> To: "satijalab/seurat" @.> Cc: "jjacob12" @.>; "Mention" @.> Sent: Thursday, 16 Dec, 2021 At 19:58 Subject: Re: [satijalab/seurat] error running RunPCA (Issue #5395)
@jjacob12 https://github.com/jjacob12 No don't create object using the code and exprMatrix.tsv.gz. The data in that expression matrix is incomplete and only contains data for 2000 genes (instead of 46,000). You should only use that second part of the code above to import the object correctly, then clean it, and update it, so that you can start over fresh (if that is your intention). read_rds is part of readr package but you can substitute readRDS which is base R function. Yes the path should just be your local path to the file. That in the above code that is just example using my path for testing this. Best, Sam — Reply to this email directly, view it on GitHub https://github.com/satijalab/seurat/issues/5395#issuecomment-996150968 , or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFENHQPFWFR7LWT7TM3VTURJAFFANCNFSM5KGKN4JA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . You are receiving this because you were mentioned.Message ID: @.***>
Hi @jjacob12 I don't see any plot but I think I know what you are referring to.
In the object the current active.ident is maintained from the original object even though we've removed the data. All of the meta data is still contained in the object as you can see with:
View(so2@meta.data)
To change things back to plot by sample id you simply need to run Idents()
as you would to anytime to change the active ident:
Idents(so2) <- "sample_id"
Best, Sam
Hi Sam, So, what I get with the PBMCK vignette is this:
However, when I run:
Indents(so2) <- "sample_id" so2[["percent.mt"]] <- PercentageFeatureSet(so2, pattern = "^MT-")
Apologies, pressed the comment button too soon.
So when I run:
Indents(so2) <- "sample_id"
so2[["percent.mt"]] <- PercentageFeatureSet(so2, pattern = "^MT-")
VlnPlot(so2, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), pt.size = 0, ncol = 3)
I get the below plot:
Hi @jjacob12 Yes that is correct. pbmc3k only has one sample in the dataset where as this public dataset has 15. So those are the expected plots.
Best, Sam
Ah, should have realised that. Sorry for the dumb question. Best, John
Hey John,
No worries!! :)
Thanks Sam!
Thanks @samuel-marsh ! It seems that this issue is resolved.
Agree, many thanks once again. Best wishes John
Dr John Jacob
Sent from my iPhone
On 28 Jan 2022, at 17:24, Yuhan Hao @.***> wrote:
Thanks @samuel-marsh ! It seems that this issue is resolved.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.
Hi @jjacob12 I faced the same problem as you did but I got out of trouble with the help of a bioinformatician. Aapparently the PCA is absent in your seurat object. You might have missed to run ScaleData, RunPCA and RunUMAP on the integrated data. Following commands may help after you create your integrated object:
seu_int <- Seurat::ScaleData(seu_int) seu_int <- Seurat::RunPCA(seu_int, npcs = 30) seu_int <- Seurat::RunUMAP(seu_int, reduction = "pca", dims = 1:25)
Hi @rizwanahmad95 thanks for suggesting solution. However, in this case issue was with RunPCA giving an error and not allowing it to run. We did solve the issue which was due to the nature of the public data through the series of posts above.
Best, Sam
Hi there, Looks like this issue has been encountered before (RunPCA error #1788), but doesn't look like there was a definitive solution. Also, the datasets associated with the issue were not provided. I am new to Seurat (and writing code/R!). Running Seurat v4 on RStudio-Server/1.4.1717-foss-2021a-Java-11-R-4.1.0, using R-bundle-Bioconductor/3.13-foss-2021a-R-4.1.0.
I'm analysing a public snRNA-seq dataset ((https://cbl-dev.cells.ucsc.edu) of around 70,000 cells/nuclei. I used code from, https://cellbrowser.readthedocs.io/en/master/load.html to create the Seurat object, running the first 9 lines of code under the subheading, 'Seurat'. The Seurat object is designated 'so'.
The libraries/options I loaded were: library(tidyverse) library(Seurat) library(data.table) library(patchwork) options(bitmapType='cairo-png') # this line needed to generate plots
I have used the code for downstream analysis from the vignette, https://satijalab.org/seurat/articles/pbmc3k_tutorial.html All worked fine until I ran the 'RunPCA' command. Up to that point I ran: so <- NormalizeData(so, normalization.method = "LogNormalize", scale.factor = 10000) so <- ScaleData(so, features = rownames(so)) so <- FindVariableFeatures(so, selection.method = "vst", nfeatures = 2000) so <- RunPCA(so, features = VariableFeatures(object = so))
With the last line I got the error message that others have had before: "Error in irlba(A = t(x = object), nv = npcs, ...) : max(nu, nv) must be positive In addition: Warning message: In PrepDR(object = object, features = features, verbose = verbose) : The following 2000 features requested have zero variance (running reduction without them):..."
To see if I could progress the analysis nevertheless, I ran: DimPlot(pbmc, reduction = "pca")
So I am stuck. Can you please look into what's causing this issue and recommend a fix? Many thanks.