satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 902 forks source link

Issues with subset function and layers in RNA assay #9120

Closed msaliutina closed 1 month ago

msaliutina commented 1 month ago

Hi everyone!

Now I am struggling with a couple of issues related to Seurat v5.1.0.

1) I have problems with the subsetting of seurat objects after I create them from matrix, feature and barcode files:

library(Seurat)
library(Matrix)
pmbc_m <- readMM('SCP2695/expression/raw/raw.mtx')
rownames(pmbc_m) <- read.table('SCP2695/expression/raw/features_-_raw.tsv')[,1]
rownames(pmbc_m) <- make.unique(rownames(pmbc_m))
colnames(pmbc_m) <- readLines('SCP2695/expression/raw/barcodes_-_raw.tsv')
pbmc <- CreateSeuratObject(counts=pmbc_m, min.cells=0, min.features=0, names.field=1)

pmbc_n_m <- readMM('SCP2695/expression/norm/processed.mtx')
rownames(pmbc_n_m) <- read.table('SCP2695/expression/norm/features.tsv')[, 1]
colnames(pmbc_n_m) <- readLines('SCP2695/expression/norm/barcodes.tsv')

pbmc@assays$RNA@layers$data <- pmbc_n_m

metadata <- read.table('SCP2695/metadata/SCP_metadata_run2182.txt', 
                       header = TRUE, row.names = T,
                       sep = "\t")

metadata <- metadata[-1, ]
rownames(metadata) <- metadata$NAME
metadata[1] <- NULL
pbmc <- AddMetaData(pbmc, metadata)

Idents(pbmc) <- pbmc$cell_type_named

pbmc_sub <- subset(pbmc, idents = 'proliferating Pou1f1')
Error in match.arg(arg = i, choices = colnames(x = x)) : 
  'arg' should be “counts”

The data was obtained from here: https://singlecell.broadinstitute.org/single_cell/study/SCP2695/single-cell-rna-sequencing-of-p4-female-wild-type-and-prop1-df-df-mutant-pituitary-cells as a test actually, bc with my original dataset I have same issues.

2) If I want to check my layers in my Seurat object ('RNA' assay) I do not see any information about cell barcodes and gene names, and I was trying to do sth like this

rownames(pbmc@assays$RNA@layers$counts) <- dimnames(pbmc)[[1]]
colnames(pbmc@assays$RNA@layers$counts) <- dimnames(pbmc)[[2]]

But I am afraid it also looks suspicious.

Any feedback will be highly appreciated!

R version 4.3.3 (2024-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC tzcode source: system (glibc)

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4 Matrix_1.6-5

loaded via a namespace (and not attached): [1] deldir_2.0-4 pbapply_1.7-2 gridExtra_2.3 rlang_1.1.4
[5] magrittr_2.0.3 RcppAnnoy_0.0.22 matrixStats_1.3.0 ggridges_0.5.6
[9] compiler_4.3.3 spatstat.geom_3.2-9 png_0.1-8 vctrs_0.6.5
[13] reshape2_1.4.4 stringr_1.5.1 pkgconfig_2.0.3 fastmap_1.2.0
[17] utf8_1.2.4 promises_1.3.0 purrr_1.0.2 jsonlite_1.8.8
[21] goftest_1.2-3 later_1.3.2 spatstat.utils_3.0-5 irlba_2.3.5.1
[25] parallel_4.3.3 cluster_2.1.6 R6_2.5.1 ica_1.0-3
[29] stringi_1.8.4 RColorBrewer_1.1-3 spatstat.data_3.1-2 reticulate_1.38.0
[33] parallelly_1.37.1 lmtest_0.9-40 scattermore_1.2 Rcpp_1.0.12
[37] tensor_1.5 future.apply_1.11.2 zoo_1.8-12 sctransform_0.4.1
[41] httpuv_1.6.15 splines_4.3.3 igraph_2.0.3 tidyselect_1.2.1
[45] rstudioapi_0.16.0 abind_1.4-5 spatstat.random_3.2-3 codetools_0.2-20
[49] miniUI_0.1.1.1 spatstat.explore_3.2-7 listenv_0.9.1 lattice_0.22-5
[53] tibble_3.2.1 plyr_1.8.9 shiny_1.8.1.1 ROCR_1.0-11
[57] Rtsne_0.17 future_1.33.2 fastDummies_1.7.3 survival_3.7-0
[61] polyclip_1.10-6 fitdistrplus_1.2-1 pillar_1.9.0 KernSmooth_2.23-24
[65] plotly_4.10.4 generics_0.1.3 RcppHNSW_0.6.0 ggplot2_3.5.1
[69] munsell_0.5.1 scales_1.3.0 globals_0.16.3 xtable_1.8-4
[73] glue_1.7.0 lazyeval_0.2.2 tools_4.3.3 data.table_1.15.4
[77] RSpectra_0.16-1 RANN_2.6.1 leiden_0.4.3.1 dotCall64_1.1-1
[81] cowplot_1.1.3 grid_4.3.3 tidyr_1.3.1 colorspace_2.1-0
[85] nlme_3.1-165 patchwork_1.2.0 cli_3.6.3 spatstat.sparse_3.1-0 [89] spam_2.10-0 fansi_1.0.6 viridisLite_0.4.2 dplyr_1.1.4
[93] uwot_0.2.2 gtable_0.3.5 digest_0.6.36 progressr_0.14.0
[97] ggrepel_0.9.5 htmlwidgets_1.6.4 htmltools_0.5.8.1 lifecycle_1.0.4
[101] httr_1.4.7 mime_0.12 MASS_7.3-60.0.1

rsatija commented 1 month ago

You do not need to set the row names and column names of the individual layers (this is handled internally in Seurat) . You can run:

LayerData(pbmc,assay='RNA',layer='data') <- pmbc_n_m

to check that it was set correctly, you can run

test = LayerData(pbmc,assay='RNA',layer='data')
rownames(test)
colnames(test)

To set normalized data. You can also run NormalizeData after creating the Seurat object to get normalized values, which should also solve your problem if there are any discrepancies in the raw and normalized matrices you downloaded from the SCP portal.