satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.3k stars 917 forks source link

Cannot RUN PCA on CellPlex data with Seurat #6001

Closed SDJ7007 closed 2 years ago

SDJ7007 commented 2 years ago

Hello, Finally 10X Genomics has given a code for analysis of CELL Plex (CMO) in Seurat. The code is written in https://www.10xgenomics.com/cn/resources/analysis-guides/tag-assignment-of-10x-Genomics-cellplex-data-using-seurats-htodemux-function and they have linked the vignette of https://satijalab.org/seurat/articles/hashing_vignette.html

I cannot run PCA on my data. I am getting this error: seurat.singlet <- RunPCA(seurat.singlet, features = VariableFeatures(object = seurat.singlet)) Error in irlba(A = t(x = object), nv = npcs, ...) : max(nu, nv) must be positive

How can I overcome this please? I read the other erros but none of it worked for me.

The codes prior is as follows: library(Seurat) Attaching SeuratObject Attaching sp

setwd("E:/Cell_Ranger_10x_Seurat") data_dir <- "E:/Cell_Ranger_10x_Seurat" data <- Read10X(data.dir = data_dir) 10X data contains more than one type and is being returned as a list containing matrices of each type. seurat_object= CreateSeuratObject(counts = data$Gene Expression, project = "10X_Seurat") seurat_object[["CMO"]] = CreateAssayObject(counts = data$Multiplexing Capture) head(seurat_object@meta.data) orig.ident nCount_RNA nFeature_RNA nCount_CMO nFeature_CMO AAACCCAAGAAACACT-1 10X_Seurat 1 1 0 0 AAACCCAAGAAACCAT-1 10X_Seurat 1 1 0 0 AAACCCAAGAAACCCA-1 10X_Seurat 7 7 0 0 AAACCCAAGAAACCCG-1 10X_Seurat 0 0 1 1 AAACCCAAGAAACCTG-1 10X_Seurat 1 1 0 0 AAACCCAAGAAACTAC-1 10X_Seurat 3 3 0 0 library(data.table) data.table 1.14.2 using 8 threads (see ?getDTthreads). Latest news: r-datatable.com head(seurat_object@assays) $RNA Assay data with 32285 features for 2893103 cells First 10 features: Xkr4, Gm1992, Gm19938, Gm37381, Rp1, Sox17, Gm37587, Gm37323, Mrpl15, Lypla1

$CMO Assay data with 12 features for 2893103 cells First 10 features: CMO301, CMO302, CMO303, CMO304, CMO305, CMO306, CMO307, CMO308, CMO309, CMO310

cells <- fread("assignment_confidence_table.csv",select = c("Barcodes"))

cells[1:5] Barcodes 1: AAACCCAAGACTCAAA-1 2: AAACCCAAGATACATG-1 3: AAACCCAAGATAGCAT-1 4: AAACCCAAGCCTGTCG-1 5: AAACCCAAGTGGCCTC-1 seurat_object_use <- subset(seurat_object, cells = cells$Barcodes) DefaultAssay(seurat_object_use)<-"CMO" seurat_object_use <- subset(seurat_object_use, features = c("CMO309","CMO310", "CMO311", "CMO312")) seurat_object_use = subset(x = seurat_object_use, subset = nCount_CMO > 0) seurat_object_norm <- NormalizeData(seurat_object_use, assay = "CMO", normalization.method = "CLR") Normalizing across features |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
seurat_object_demux <- HTODemux(seurat_object_norm, assay = "CMO", positive.quantile = 0.99) Cutoff for CMO309 : 1285 reads Cutoff for CMO310 : 966 reads Cutoff for CMO311 : 1028 reads Cutoff for CMO312 : 1056 reads table(seurat_object_demux$CMO_classification.global)

Doublet Negative Singlet 880 1599 9862

Idents(seurat_object_demux) <- "CMO_maxID"

RidgePlot(seurat_object_demux, assay = "CMO", features = rownames(seurat_object_demux[["CMO"]])

  • RidgePlot(seurat_object_demux, assay = "CMO", features = rownames(seurat_object_demux[["CMO"]])[1:4], ncol = 2) Picking joint bandwidth of 0.061 Picking joint bandwidth of 0.0577 Picking joint bandwidth of 0.0592 Picking joint bandwidth of 0.064 RidgePlot(seurat_object_demux, assay = "CMO", features = rownames(seurat_object_demux[["CMO"]])[1:2], ncol = 2) Picking joint bandwidth of 0.061 Picking joint bandwidth of 0.0577 FeatureScatter(seurat_object_demux, feature1 = "CMO309", feature2 = "CMO310") FeatureScatter(seurat_object_demux, feature1 = "CMO309", feature2 = "CMO310", label = TRUE) Error in FeatureScatter(seurat_object_demux, feature1 = "CMO309", feature2 = "CMO310", : unused argument (label = TRUE) FeatureScatter(seurat_object_demux, feature1 = "CMO309", feature2 = "CMO310") Idents(seurat_object_demux) <- "CMO_classification.global" VlnPlot(seurat_object_demux, features = "nCount_RNA", pt.size = 0.1, log = TRUE) seurat_object_demux_subset<- subset(seurat_object_demux, idents = "Negative", invert = TRUE)

cmo.dist.mtx <- as.matrix(dist(t(GetAssayData(object = seurat_object_demux_subset, assay = "CMO")))) seurat_object_demux_subset <- RunTSNE(seurat_object_demux_subset, distance.matrix = cmo.dist.mtx, perplexity = 100) Warning: Adding a command log without an assay associated with it DimPlot(seurat_object_demux_subset) HTOHeatmap(seurat_object_demux, assay = "CMO", ncells = 7000) Warning message: guides(<scale> = FALSE) is deprecated. Please use guides(<scale> = "none") instead. seurat.singlet <- subset(seurat_object_demux, idents = "Singlet") seurat.singlet <- FindVariableFeatures(seurat.singlet, selection.method = "mean.var.plot") Calculating gene means 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Calculating gene variance to mean ratios 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| seurat.singlet <- ScaleData(seurat.singlet, features = VariableFeatures(seurat.singlet)) Centering and scaling data matrix |===========================================================================================================| 100% seurat.singlet <- RunPCA(seurat.singlet, features = VariableFeatures(seurat.singlet)) Error in irlba(A = t(x = object), nv = npcs, ...) : max(nu, nv) must be positive ElbowPlot(seurat.singlet) Error: Cannot find 'pca' in this Seurat object ElbowPlot(seurat_object_demux) Error: Cannot find 'pca' in this Seurat object View(seurat.singlet) seurat.singlet <- RunPCA(seurat.singlet, features = VariableFeatures(object = seurat.singlet)) Error in irlba(A = t(x = object), nv = npcs, ...) : max(nu, nv) must be positive print(seurat.singlet[["pca"]], dims = 1:5, nfeatures = 5) Error: Cannot find 'pca' in this Seurat object

Can anyone help please?

Thank you.

Kind Regards, Shweta Johari

SDJ7007 commented 2 years ago

sessionInfo() R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.14.2 sp_1.4-7 SeuratObject_4.1.0 Seurat_4.1.1

loaded via a namespace (and not attached): [1] nlme_3.1-157 matrixStats_0.62.0 spatstat.sparse_2.1-1 RcppAnnoy_0.0.19 RColorBrewer_1.1-3
[6] httr_1.4.3 sctransform_0.3.3 tools_4.2.0 utf8_1.2.2 R6_2.5.1
[11] irlba_2.3.5 rpart_4.1.16 KernSmooth_2.23-20 uwot_0.1.11 mgcv_1.8-40
[16] rgeos_0.5-9 DBI_1.1.2 lazyeval_0.2.2 colorspace_2.0-3 withr_2.5.0
[21] tidyselect_1.1.2 gridExtra_2.3 compiler_4.2.0 progressr_0.10.0 cli_3.3.0
[26] plotly_4.10.0 labeling_0.4.2 scales_1.2.0 lmtest_0.9-40 spatstat.data_2.2-0
[31] ggridges_0.5.3 pbapply_1.5-0 goftest_1.2-3 stringr_1.4.0 digest_0.6.29
[36] spatstat.utils_2.3-1 pkgconfig_2.0.3 htmltools_0.5.2 parallelly_1.31.1 fastmap_1.1.0
[41] htmlwidgets_1.5.4 rlang_1.0.2 rstudioapi_0.13 shiny_1.7.1 farver_2.1.0
[46] generics_0.1.2 zoo_1.8-10 jsonlite_1.8.0 spatstat.random_2.2-0 ica_1.0-2
[51] dplyr_1.0.9 magrittr_2.0.3 patchwork_1.1.1 Matrix_1.4-1 Rcpp_1.0.8.3
[56] munsell_0.5.0 fansi_1.0.3 abind_1.4-5 reticulate_1.25 lifecycle_1.0.1
[61] stringi_1.7.6 MASS_7.3-56 Rtsne_0.16 plyr_1.8.7 grid_4.2.0
[66] parallel_4.2.0 listenv_0.8.0 promises_1.2.0.1 ggrepel_0.9.1 crayon_1.5.1
[71] deldir_1.0-6 miniUI_0.1.1.1 lattice_0.20-45 cowplot_1.1.1 splines_4.2.0
[76] tensor_1.5 pillar_1.7.0 igraph_1.3.1 spatstat.geom_2.4-0 future.apply_1.9.0
[81] reshape2_1.4.4 codetools_0.2-18 leiden_0.4.2 glue_1.6.2 png_0.1-7
[86] vctrs_0.4.1 httpuv_1.6.5 polyclip_1.10-0 gtable_0.3.0 RANN_2.6.1
[91] purrr_0.3.4 spatstat.core_2.4-4 tidyr_1.2.0 scattermore_0.8 future_1.25.0
[96] assertthat_0.2.1 ggplot2_3.3.6 mime_0.12 xtable_1.8-4 later_1.3.0
[101] survival_3.3-1 viridisLite_0.4.0 tibble_3.1.7 cluster_2.1.3 globals_0.15.0
[106] fitdistrplus_1.1-8 ellipsis_0.3.2 ROCR_1.0-11

samuel-marsh commented 2 years ago

Hi,

Not member of dev team but hopefully can be helpful. Did you switch default assay back to RNA before running RunPCA?

Best, Sam

SDJ7007 commented 2 years ago

Hey Thank you so much for your response. So that is something I thought about too, So I figured this out: In the 10X genomics code written, there is a command, https://www.10xgenomics.com/cn/resources/analysis-guides/tag-assignment-of-10x-Genomics-cellplex-data-using-seurats-htodemux-function : DefaultAssay(seurat_object_use)<-"CMO" Then further on, all the work is done by this object and using this assay. Then even if I try to get back to RNA as my assay. It doesn't work. It says no RNA assay.

I also tried going back to the object with both assays and then it doesn't work because the normalization and scaling is done with the other objects and assay as CMO.

Thank you for your response though.

Kind Regards, Shweta Johari

samuel-marsh commented 2 years ago

Hi,

So you don't want to run the analysis with CMO as the assay. That page from 10X is just describing setting up the object not the downstream analysis which needs to be run on gene expression data. That page/process is just for demultiplexing the samples and if you try and run analysis on it's not going to work. The reason the RunPCA step failed is because it by default it runs 50 PCs but your number of features can't be less than PCs and if CMO is still default assay then it will be (hence it failed). The reason that the RNA assay isn't present anymore is due to running subset on the CMO assay (see here for reasoning and proposed solution: https://github.com/satijalab/seurat/issues/5978#issuecomment-1133342063).

Best, Sam

SDJ7007 commented 2 years ago

Hello Dr.Samuel,

Thank you so much for your time and Thank you so much for guiding me to the solution. I was perplexed by the code as it added the vignette of integration too.

To clarify, the solution of the https://github.com/satijalab/seurat/issues/5978#issuecomment-1133342063, means: After normalization of the CMO/ before make the CMO as the default assay: To use: object_filtered = subset(object, features = c(rownames(object[["RNA"]]), c("CMO310","CMO311", "CMO312"))

or: S22_23_24_cells_filtered_CMO_norm_tags <- subset(S22_23_24_cells_filtered_CMO_norm, features = c("CMO310","CMO311", "CMO312")) Instead of this, To use: object_filtered = subset(object, features = c(rownames(object[["RNA"]]), c("CMO310","CMO311", "CMO312"))

So to understand it better according to the script given in the question:

  1. Load the dataset

  2. Create the Seurat Object

  3. Create the two assays

  4. Normalize RNA data with log normalization

  5. Find and scale variable features

  6. Then add the cells document

  7. Remove the mt

  8. VlnPlot: nFeatures_nCountRNA, nCountRNA_mt plot1 + plot2

  9. Then filter the data: S22_23_24_cells_filtered <- subset(S22_23_24_cells, subset = nFeature_RNA > 200 & percent.mt < 10)

  10. Scale S22_23_24_cells_filtered <- ScaleData(S22_23_24_cells_filtered, features = VariableFeatures(S22_23_24_cells)) 12 Normalize CMO data, here we use centered log-ratio (CLR) transformation 13 DefaultAssay(S22_23_24_cells_filtered_CMO_norm)<-"CMO" a) S22_23_24_cells_filtered_CMO_norm_tags <- subset(S22_23_24_cells_filtered_CMO_norm, features = c("CMO310","CMO311", "CMO312")) b) S22_23_24_cells_filtered_CMO_norm_tags_nozeros = subset(x = S22_23_24_cells_filtered_CMO_norm_tags, subset = nCount_CMO > 0)

  11. HTODemux

  12. table

  13. Idents

  14. RidgePlot

  15. FeatureScatter

  16. Idents

  17. First, subset singlets 21.FindVariableFeatures 22.Scale Data

  18. Calculate a distance matrix using HTO

  19. Heatmap

  20. Calculate tSNE embeddings with a distance matrix

  21. DimPlot 27.HTOHeatMap