Closed nikita-telkar closed 3 months ago
hi @nikita-telkar Could you show us your sessionInfo()? Also, Do you meet this situation for pbmc3k? For example,
> library(Seurat)
> library(SeuratData)
> obj <- LoadData(ds = 'pbmc3k')
> head(obj$nFeature_RNA)
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG AAACCGTGTATGCG AAACGCACTGGTAC
779 1352 1129 960 521 781
>
> obj.filter <- subset(obj, subset = nFeature_RNA > 500)
> head(obj.filter$nFeature_RNA)
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG AAACCGTGTATGCG AAACGCACTGGTAC
779 1352 1129 960 521 781
@yuhanH
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.3.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pbmc3k.SeuratData_3.1.4 SeuratData_0.2.2 lubridate_1.9.2
[4] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
[7] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[10] tibble_3.2.1 tidyverse_2.0.0 ggplot2_3.4.2
[13] ggthemes_4.2.4 kableExtra_1.3.4 plomics_0.2.0
[16] BiocManager_1.30.20 circlize_0.4.15 RColorBrewer_1.1-3
[19] scales_1.2.1 gridExtra_2.3 data.table_1.14.8
[22] rmarkdown_2.21 pheatmap_1.0.12 arsenal_3.6.3
[25] openxlsx_4.2.5.2 readxl_1.4.2 here_1.0.1
[28] SeuratObject_4.1.3 Seurat_4.3.0 knitr_1.42
[31] formatR_1.14
loaded via a namespace (and not attached):
[1] systemfonts_1.0.4 plyr_1.8.8 igraph_1.4.2
[4] lazyeval_0.2.2 sp_1.6-0 splines_4.2.2
[7] listenv_0.9.0 scattermore_0.8 usethis_2.1.6
[10] digest_0.6.31 htmltools_0.5.5 fansi_1.0.4
[13] memoise_2.0.1 magrittr_2.0.3 tensor_1.5
[16] cluster_2.1.4 ROCR_1.0-11 remotes_2.4.2
[19] tzdb_0.3.0 globals_0.16.2 matrixStats_0.63.0
[22] svglite_2.1.1 timechange_0.2.0 spatstat.sparse_3.0-1
[25] prettyunits_1.1.1 colorspace_2.1-0 rappdirs_0.3.3
[28] rvest_1.0.3 ggrepel_0.9.3 xfun_0.39
[31] crayon_1.5.2 callr_3.7.3 jsonlite_1.8.4
[34] progressr_0.13.0 spatstat.data_3.0-1 survival_3.5-5
[37] zoo_1.8-12 glue_1.6.2 polyclip_1.10-4
[40] gtable_0.3.3 webshot_0.5.4 leiden_0.4.3
[43] pkgbuild_1.4.0 future.apply_1.10.0 shape_1.4.6
[46] abind_1.4-5 DBI_1.1.3 spatstat.random_3.1-4
[49] miniUI_0.1.1.1 Rcpp_1.0.10 viridisLite_0.4.1
[52] xtable_1.8-4 reticulate_1.28 profvis_0.3.8
[55] htmlwidgets_1.6.2 httr_1.4.5 ellipsis_0.3.2
[58] ica_1.0-3 urlchecker_1.0.1 pkgconfig_2.0.3
[61] uwot_0.1.14 deldir_1.0-6 utf8_1.2.3
[64] tidyselect_1.2.0 rlang_1.1.1 reshape2_1.4.4
[67] later_1.3.0 cachem_1.0.8 munsell_0.5.0
[70] cellranger_1.1.0 tools_4.2.2 cli_3.6.1
[73] generics_0.1.3 devtools_2.4.5 ggridges_0.5.4
[76] evaluate_0.20 fastmap_1.1.1 yaml_2.3.7
[79] goftest_1.2-3 processx_3.8.1 fs_1.6.2
[82] fitdistrplus_1.1-11 zip_2.3.0 RANN_2.6.1
[85] egg_0.4.5 pbapply_1.7-0 future_1.32.0
[88] nlme_3.1-162 mime_0.12 xml2_1.3.4
[91] compiler_4.2.2 rstudioapi_0.14 curl_5.0.0
[94] plotly_4.10.1 png_0.1-8 spatstat.utils_3.0-2
[97] stringi_1.7.12 ps_1.7.5 desc_1.4.2
[100] lattice_0.21-8 Matrix_1.5-4 vctrs_0.6.2
[103] pillar_1.9.0 lifecycle_1.0.3 spatstat.geom_3.1-0
[106] lmtest_0.9-40 GlobalOptions_0.1.2 RcppAnnoy_0.0.20
[109] cowplot_1.1.1 irlba_2.3.5.1 httpuv_1.6.9
[112] patchwork_1.1.2 R6_2.5.1 promises_1.2.0.1
[115] KernSmooth_2.23-20 parallelly_1.35.0 sessioninfo_1.2.2
[118] codetools_0.2-19 pkgload_1.3.2 MASS_7.3-59
[121] rprojroot_2.0.3 withr_2.5.0 sctransform_0.3.5
[124] parallel_4.2.2 hms_1.1.3 grid_4.2.2
[127] Rtsne_0.16 spatstat.explore_3.1-0 shiny_1.7.4
install.packages("devtools")
devtools::install_github("satijalab/seurat-data")
library(SeuratData)
SeuratData::InstallData("pbmc3k")
obj <- SeuratData::LoadData(ds = "pbmc3k")
head(obj$nFeature_RNA)
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG AAACCGTGTATGCG AAACGCACTGGTAC
779 1352 1129 960 521 781
> obj@meta.data %>%
+ dplyr::arrange(nFeature_RNA) %>%
+ head()
orig.ident nCount_RNA nFeature_RNA seurat_annotations
GGCATATGGGGAGT pbmc3k 575 212 Platelet
CTAATAGAGCTATG pbmc3k 600 239 Naive CD4 T
ATCTCAACCTCGAA pbmc3k 609 246 Naive CD4 T
GACGCTCTCTCTCG pbmc3k 682 266 Platelet
ACTTTGTGCGATAC pbmc3k 600 267 <NA>
TAGTCTTGGCTGTA pbmc3k 652 270 Naive CD4 T
# filtered
obj.filter <- subset(obj, subset = nFeature_RNA > 500)
head(obj.filter$nFeature_RNA)
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG AAACCGTGTATGCG AAACGCACTGGTAC
779 1352 1129 960 521 781
> obj.filter@meta.data %>%
+ dplyr::arrange(nFeature_RNA) %>%
+ head()
orig.ident nCount_RNA nFeature_RNA seurat_annotations
GTGATGACCTGAGT pbmc3k 1095 502 B
AGTAATACCGAACT pbmc3k 1236 503 B
GGGATGGACGACAT pbmc3k 1307 503 B
TGGTAGTGCACTGA pbmc3k 1110 503 CD14+ Mono
ACCACCTGTGTGCA pbmc3k 1176 505 Naive CD4 T
CGTCAAGAAAGGTA pbmc3k 1111 505 CD14+ Mono
Not sure what's happening
all_counts <- merged_seurat@assays$RNA@counts
head(all_counts)
hs_counts <- CollapseSpeciesExpressionMatrix(
all_counts,
prefix = "GRCh38-",
controls = "mm10---",
ncontrols = 0
)
hs_genes <- hs_counts@Dimnames[[1]]
hs_genes <- as.data.frame(hs_genes)
colnames(hs_genes) <- "human_genes"
hs <- CreateSeuratObject(hs_counts, meta.data = merged_seurat@meta.data)
hs.filter <- subset(hs, subset = nFeature_RNA > 500)
hs.filter@meta.data %>%
dplyr::arrange(nFeature_RNA) %>%
head()
orig.ident nCount_RNA nFeature_RNA
H3_AAAGAACGTCCCACGA-1 H3-1207 0 0
H3_ACCAACAAGTTCCATG-1 H3-1207 0 0
H3_ACGATCACAGGCTTGC-1 H3-1207 0 0
H3_ACGTCCTCACTCTGCT-1 H3-1207 0 0
H3_ACTGCAACAACCGATT-1 H3-1207 0 0
H3_AGAAGCGTCGGAGTAG-1 H3-1207 0 0
Thank you!
Hi @yuhanH , just wanted to check if there might have been any updates since I last posted in May? :)
Thanks, Nikita
hi @nikita-telkar Sorry for the delay. Could you send me a small reproducible example? That will be much easier for me to figure out this issue. email: yhao@nygenome.org
Hello Seurat team,
I am facing the same problem using Seurat5, but it looks like it stems from the function "split", not "subset". I have only realized it now because all the code worked fine, but then I discovered the RNA dimnames slots are empty, whereas the matrix is populated, strange indeed. On the contrary, the SCT matrix has both dimnames and matrix values, as expected. Therefore, either there is a bug, or there is a sort of function in Seurat5 that relates the two, but still missing is the dimnames. Or, probably, there is a mistake on my end.
I have tested the function subset, but it does not seem to originate from it. On the contrary, after running the function split, the "dimnames" slot results empty. One consideration is that I am starting from a seurat object v3. Even if I update it, running UpdateSeuratObject() shows a warning; here is the part where slots are wrongly converted from v3 to v5, I guess.
One solution would be to export both matrix and metadata and generate a new object. I guess.
My code: DefaultAssay(SIM1) <- "RNA" SIM1[["RNA"]] <- split(SIM1[["RNA"]], f = SIM1$sample) covs <- c("mt.percent") SIM1 <- SCTransform(SIM1, vst.flavor = "v2", verbose=T, ncells =length(rownames(SIM1@meta.data)) , vars.to.regress = covs) SIM1 <- RunPCA(SIM1, npcs = 50, verbose = T) SIM1 <- IntegrateLayers(object = SIM1, method = HarmonyIntegration, orig.reduction = "pca", new.reduction = "harmony", verbose = T) SIM1 <- FindNeighbors(SIM1, dims = 1:50, reduction = "harmony") SIM1 <- FindClusters(SIM1, resolution = 0.6) SIM1 <- RunUMAP(SIM1, dims = 1:50, reduction = "harmony", reduction.name = "umap") SIM1[["RNA"]] <- JoinLayers(SIM1[["RNA"]], f = SIM1$sample) UMAPPlot(SIM1, label=T, pt.size=0)
sessionInfo() R version 4.3.1 (2023-06-16) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.5
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Rome tzcode source: internal
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmGamPoi_1.12.2 patchwork_1.1.3 ggplot2_3.4.3 Seurat_4.9.9.9067
[5] SeuratObject_4.9.9.9091 sp_2.1-0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.7
[4] magrittr_2.0.3 spatstat.utils_3.0-3 zlibbioc_1.46.0
[7] vctrs_0.6.3 ROCR_1.0-11 spatstat.explore_3.2-3
[10] RCurl_1.98-1.12 S4Arrays_1.0.6 htmltools_0.5.6
[13] sctransform_0.4.0 parallelly_1.36.0 KernSmooth_2.23-22
[16] htmlwidgets_1.6.2 ica_1.0-3 plyr_1.8.9
[19] plotly_4.10.2 zoo_1.8-12 igraph_1.5.1
[22] mime_0.12 lifecycle_1.0.3 pkgconfig_2.0.3
[25] Matrix_1.6-1.1 R6_2.5.1 fastmap_1.1.1
[28] GenomeInfoDbData_1.2.10 MatrixGenerics_1.12.3 fitdistrplus_1.1-11
[31] future_1.33.0 shiny_1.7.5 digest_0.6.33
[34] colorspace_2.1-0 S4Vectors_0.38.2 tensor_1.5
[37] RSpectra_0.16-1 irlba_2.3.5.1 GenomicRanges_1.52.0
[40] progressr_0.14.0 fansi_1.0.4 spatstat.sparse_3.0-2
[43] httr_1.4.7 polyclip_1.10-6 abind_1.4-5
[46] compiler_4.3.1 remotes_2.4.2.1 withr_2.5.1
[49] fastDummies_1.7.3 MASS_7.3-60 DelayedArray_0.26.7
[52] tools_4.3.1 lmtest_0.9-40 httpuv_1.6.11
[55] future.apply_1.11.0 goftest_1.2-3 glue_1.6.2
[58] nlme_3.1-163 promises_1.2.1 grid_4.3.1
[61] Rtsne_0.16 cluster_2.1.4 reshape2_1.4.4
[64] generics_0.1.3 gtable_0.3.4 spatstat.data_3.0-1
[67] tidyr_1.3.0 data.table_1.14.8 utf8_1.2.3
[70] XVector_0.40.0 BiocGenerics_0.46.0 BPCells_0.1.0
[73] spatstat.geom_3.2-5 RcppAnnoy_0.0.21 ggrepel_0.9.3
[76] RANN_2.6.1 pillar_1.9.0 stringr_1.5.0
[79] spam_2.9-1 RcppHNSW_0.5.0 later_1.3.1
[82] splines_4.3.1 dplyr_1.1.3 lattice_0.21-9
[85] survival_3.5-7 deldir_1.0-9 tidyselect_1.2.0
[88] miniUI_0.1.1.1 pbapply_1.7-2 gridExtra_2.3
[91] IRanges_2.34.1 SummarizedExperiment_1.30.2 scattermore_1.2
[94] stats4_4.3.1 Biobase_2.60.0 matrixStats_1.0.0
[97] stringi_1.7.12 lazyeval_0.2.2 codetools_0.2-19
[100] tibble_3.2.1 cli_3.6.1 uwot_0.1.16
[103] xtable_1.8-4 reticulate_1.32.0 munsell_0.5.0
[106] Rcpp_1.0.11 GenomeInfoDb_1.36.3 globals_0.16.2
[109] spatstat.random_3.1-6 png_0.1-8 parallel_4.3.1
[112] ellipsis_0.3.2 dotCall64_1.0-2 bitops_1.0-7
[115] listenv_0.9.0 viridisLite_0.4.2 scales_1.2.1
[118] ggridges_0.5.4 crayon_1.5.2 leiden_0.4.3
[121] purrr_1.0.2 rlang_1.1.1 cowplot_1.1.1
Hi,
Thank you all for your thoughts.
We haven't heard back about the initial issue and I wasn't able to reproduce this either, so I am closing now. Please open a new issue if you are continuing to have problems with the subset
function.
@stefanonard85 this seems like a different problem, but it looks like the expected behavior of the new v5 assay. I'd recommend opening a new issue explaining the problem you're running into.
Hello,
I've been re-analysing my data and trying to subset my object to keep
nFeature
>= 500. But after subsetting, the meta.data shows changes in the values in the nFeature_RNA and nCount_RNA. My previous iteration of the analysis was in the earlier version of Seurat, so I'm updating the script using the newer 4.3.0 version.meta.data
of the original object (named hs) looks like - note that I've sorted the column bynFeature_RNA
, and that the lowest count within the column is 100. Total cells = 54,450filtered_qc <- subset(hs, nFeature_RNA >= 500)
filtered_qc <- subset(hs, subset = nFeature_RNA >= 500)
This is what I get - note that sorting the column based on nFeature_RNA, the lowest count is zero. Total cells = 47,269
Original dataframe: hs
Filtered dataframe: filtered_qc
Am I missing something within the manual that states that the values change?
Thank you very much in advance!