satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 906 forks source link

MapQuery: data.frame(predicted.id = prediction.ids, prediction.score = as.matrix(prediction.scores) #8660

Closed vivekruhela closed 3 months ago

vivekruhela commented 6 months ago

I am trying to use MapQuery for reference-based mapping using the following code:

seurat.reference <- NormalizeData(seurat.reference)
seurat.reference <- FindVariableFeatures(seurat.reference)
seurat.reference <- ScaleData(seurat.reference)
seurat.reference <- RunPCA(seurat.reference)
seurat.reference <- FindNeighbors(seurat.reference, dims = 1:30)
seurat.reference <- FindClusters(seurat.reference)
seurat.reference <- RunUMAP(seurat.reference, dims = 1:30, return.model = TRUE)

# change default assay from integrated to RNA
DefaultAssay(seurat.query) <- "RNA"

# Finding transfer anchors
immune.anchors <- FindTransferAnchors(reference = seurat.reference, query = seurat.query, dims = 1:30,
                                  reference.reduction = "pca",k.anchor = 60)

seurat.query <- MapQuery(anchorset = immune.anchors, reference = seurat.reference, query = seurat.query,
                            refdata = list(celltype = "cell.id"), reference.reduction = "pca", reduction.model = "umap")

At Mapquery, I am getting the following error: Finding integration vectors Finding integration vector weights 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| ****0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Predicting cell labels Error in data.frame(predicted.id = prediction.ids, prediction.score = as.matrix(prediction.scores), : arguments imply differing number of rows: 0, 99403

Please suggest. The session info is shown below:

sessionInfo() R version 4.3.2 (2023-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.8 (Ootpa)

Matrix products: default BLAS: /usr/lib64/libblas.so.3.8.0 LAPACK: /usr/lib64/liblapack.so.3.8.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C

time zone: America/Chicago tzcode source: system (glibc)

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] cluster_2.1.4 HGNChelper_0.8.1 openxlsx_4.2.5.2 dplyr_1.1.4 DropletUtils_1.22.0
[6] patchwork_1.2.0 cowplot_1.1.3 gtools_3.9.5 celldex_1.12.0 robustbase_0.99-2
[11] dynamicTreeCut_1.63-1 Seurat_5.0.1 SeuratObject_5.0.1 sp_2.1-3 readxl_1.4.3
[16] limma_3.58.1 scran_1.30.2 scater_1.30.1 ggplot2_3.5.0 scuttle_1.12.0
[21] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0 Biobase_2.62.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.6
[26] IRanges_2.36.0 S4Vectors_0.40.2 MatrixGenerics_1.14.0 matrixStats_1.2.0 AnnotationHub_3.10.0
[31] BiocFileCache_2.10.1 dbplyr_2.4.0 BiocGenerics_0.48.1

loaded via a namespace (and not attached): [1] RcppAnnoy_0.0.22 splines_4.3.2 later_1.3.2 bitops_1.0-7 filelock_1.0.3
[6] R.oo_1.26.0 tibble_3.2.1 cellranger_1.1.0 polyclip_1.10-6 fastDummies_1.7.3
[11] lifecycle_1.0.4 edgeR_4.0.16 globals_0.16.2 lattice_0.21-9 MASS_7.3-60
[16] magrittr_2.0.3 plotly_4.10.4 yaml_2.3.8 metapod_1.10.1 httpuv_1.6.14
[21] glmGamPoi_1.14.2 sctransform_0.4.1 zip_2.3.1 spam_2.10-0 spatstat.sparse_3.0-3
[26] reticulate_1.35.0 pbapply_1.7-2 DBI_1.2.1 RColorBrewer_1.1-3 abind_1.4-5
[31] zlibbioc_1.48.0 Rtsne_0.17 R.utils_2.12.3 purrr_1.0.2 RCurl_1.98-1.14
[36] rappdirs_0.3.3 GenomeInfoDbData_1.2.11 ggrepel_0.9.5 irlba_2.3.5.1 spatstat.utils_3.0-4
[41] listenv_0.9.1 goftest_1.2-3 RSpectra_0.16-1 spatstat.random_3.2-2 dqrng_0.3.2
[46] fitdistrplus_1.1-11 parallelly_1.37.0 DelayedMatrixStats_1.24.0 leiden_0.4.3.1 codetools_0.2-19
[51] DelayedArray_0.28.0 tidyselect_1.2.0 farver_2.1.1 ScaledMatrix_1.10.0 viridis_0.6.5
[56] spatstat.explore_3.2-6 jsonlite_1.8.8 BiocNeighbors_1.20.2 ellipsis_0.3.2 progressr_0.14.0
[61] ggridges_0.5.6 survival_3.5-8 tools_4.3.2 ica_1.0-3 Rcpp_1.0.12
[66] glue_1.7.0 gridExtra_2.3 SparseArray_1.2.4 HDF5Array_1.30.0 withr_3.0.0
[71] BiocManager_1.30.22 fastmap_1.1.1 rhdf5filters_1.14.1 bluster_1.12.0 fansi_1.0.6
[76] digest_0.6.35 rsvd_1.0.5 R6_2.5.1 mime_0.12 colorspace_2.1-0
[81] scattermore_1.2 tensor_1.5 spatstat.data_3.0-4 RSQLite_2.3.5 R.methodsS3_1.8.2
[86] tidyr_1.3.1 utf8_1.2.4 generics_0.1.3 data.table_1.15.0 httr_1.4.7
[91] htmlwidgets_1.6.4 S4Arrays_1.2.1 uwot_0.1.16 pkgconfig_2.0.3 gtable_0.3.4
[96] blob_1.2.4 lmtest_0.9-40 XVector_0.42.0 htmltools_0.5.7 dotCall64_1.1-1
[101] scales_1.3.0 png_0.1-8 rstudioapi_0.15.0 reshape2_1.4.4 nlme_3.1-164
[106] curl_5.2.0 rhdf5_2.46.1 cachem_1.0.8 zoo_1.8-12 stringr_1.5.1
[111] BiocVersion_3.18.1 KernSmooth_2.23-22 parallel_4.3.2 miniUI_0.1.1.1 vipor_0.4.7
[116] AnnotationDbi_1.64.1 pillar_1.9.0 grid_4.3.2 vctrs_0.6.5 RANN_2.6.1
[121] promises_1.2.1 BiocSingular_1.18.0 beachmat_2.18.1 xtable_1.8-4 beeswarm_0.4.0
[126] cli_3.6.2 locfit_1.5-9.9 compiler_4.3.2 rlang_1.1.3 crayon_1.5.2
[131] future.apply_1.11.1 labeling_0.4.3 plyr_1.8.9 ggbeeswarm_0.7.2 stringi_1.8.3
[136] deldir_2.0-2 viridisLite_0.4.2 BiocParallel_1.36.0 munsell_0.5.0 Biostrings_2.70.2
[141] lazyeval_0.2.2 spatstat.geom_3.2-8 Matrix_1.6-5 ExperimentHub_2.10.0 RcppHNSW_0.6.0
[146] sparseMatrixStats_1.14.0 bit64_4.0.5 future_1.33.1 Rhdf5lib_1.24.2 KEGGREST_1.42.0
[151] statmod_1.5.0 shiny_1.8.0 interactiveDisplayBase_1.40.0 ROCR_1.0-11 igraph_2.0.3
[156] memoise_2.0.1 DEoptimR_1.1-3 bit_4.0.5

igrabski commented 5 months ago

Hi Vivek, can you show the details leading to the creation of your objects seurat.reference and seurat.query?

rbutleriii commented 5 months ago

Hi @igrabski , I am getting the same error on a different dataset. Working on a repro, but here is my code in case it provides a clue:

# loading reference and query sets ---------------------
# prep CosMx dataset 
print(sprintf("Opening and prepping %s...", so))
sc <- readRDS(so)
# make sure query has enough pcs
if (length(sc@reductions$pca) < pcs) {sc = RunPCA(sc, npcs=pcs)}

# setup the reference for CosMx features
print(sprintf("Opening and prepping %s...", ro))
annot <- readRDS(ro)
DefaultAssay(annot) <- "RNA"
annot %>%
  NormalizeData() %>%
  ScaleData(features = rownames(sc), vars.to.regress = c("batch", "orig.ident")) %>%
  RunPCA(features = rownames(sc), npcs = pcs) %>%
  RunUMAP(dims = 1:pcs) -> annot

# Transfer annotations from reference ------------------
print("Finding anchors and transfering...")
cols <- unlist(strsplit(annot_col, ",")) # in this case the string is "subclass,cluster"
names(cols) <- cols
cols <- lapply(cols, function(i) assign(i, annot@meta.data[[i]]))
sc.anchors <- FindTransferAnchors(
  reference = annot, 
  reference.assay = "RNA",
  query = sc, 
  query.assay = "Nanostring",
  dims = 1:pcs, 
  reference.reduction = "pca"
)

# correct for too few anchors
kwt = 50
if (nrow(sc.anchors@anchors) < 50) {
  print("the k.weight is set to 5 owing to fewer than 50 anchors")
  kwt = 5
}
sc <- MapQuery(
  anchorset = sc.anchors, 
  reference = annot, 
  query = sc, 
  refdata = cols, 
  reference.reduction = "pca", 
  reduction.model = "umap",
  transferdata.args = list(k.weight = kwt)
)

Log is at the bottom since it is long. This seems like a v5 vs v4 thing for me, as it used to run, but maybe it is dataset specific. Initially I was passing my transfer refdata as a named list of column names similar to @vivekruhela

cols <- unlist(strsplit(annot_col, ","))
names(cols) <- cols
sc <- MapQuery(
  anchorset = sc.anchors, 
  reference = annot, 
  query = sc, 
  refdata = list(cols), 
  reference.reduction = "pca", 
  reduction.model = "umap",
  transferdata.args = list(k.weight = kwt)
)

Which also worked in v4, but it gave me an error, so I am providing them as a named list of column vectors instead. Which resolved the warning I was getting:

Warning: Please provide a vector that is the same length as the number of reference cells used in anchor finding.
Length of vector provided: 2
Length of vector required: 97930

Also I obviously dropped the k.weight, but that is for another issue.

Logs

[1] "Opening and prepping /labs/flongo/Tau-PS19_C31_cortex_snRNAseq/CosMx/6 Analysis/Seurat object/seurat_object.Rds..."
[1] "Opening and prepping /labs/flongo/Tau-PS19_C31_cortex_snRNAseq/seurat_v3/R7.PS19_C31.overall.rds..."
Regressing out batch, orig.ident
Centering and scaling data matrix
PC_ 1 
Positive:  Apoe, Plp1, Cst3, Ptgds, Zbtb20, Mbp, Daam2, Mobp, Fth1, Phlpp1 
       Aspa, Slc1a3, Mog, Mag, Atp1a2, Glul, Ttr, Mt1, Hspa8, Tmsb4x 
       Gab1, Myo6, Apod, Mertk, Dock1, Plpp3, Cox4i1, Mt3, Rab31, Pde8a 
Negative:  Meg3, Dlgap2, Syt1, Lrrc7, Grin2a, Kalrn, Grin2b, Ube3a, Arpp21, Rims1 
       Ryr2, Gabrb3, Grm5, Tenm4, Kcnq5, Cacna1e, Nrg1, Cacna1a, Kcnd2, Nav3 
       Rbfox3, Rims2, Cnksr2, Camk2a, Hecw1, Grin1, Gria3, Gpr158, Tenm2, Cacnb4 
PC_ 2 
Positive:  Mertk, Ptprz1, Slc1a2, Slc1a3, Atp1a2, Ntm, Gabrb1, Plpp3, Rora, Maml2 
       Bcan, Sox6, Ntsr2, Ctnnd2, Slc4a4, Cpe, Tspan7, Egfr, Sparcl1, Tnik 
       Ndrg2, Zeb1, Pbx1, Cspg5, Dgkb, Gja1, Tcf7l2, Vegfa, Grk3, Slc6a11 
Negative:  Mbp, Plp1, Mobp, Mog, Aspa, Mag, Elmo1, Dnm3, Ugt8a, Slc8a1 
       Fa2h, Bin1, Vmp1, Prickle2, Pde8a, Mapt, Slc12a2, Bcas1, Dscaml1, Shtn1 
       Il1rap, Pak1, Rtn4, Slc44a1, Nfasc, Unc5c, Grm7, Cntn2, Ssh2, Sgk1 
PC_ 3 
Positive:  Gnal, Gng7, Ryr3, Plppr1, Cacnb2, Gad2, Dgkb, Drd2, Htr2c, Reln 
       Pard3, Slc4a4, Dcc, Alcam, Robo1, Grik2, Cacna2d3, Gad1, Adora2a, Kirrel3 
       Cntn5, Pde10a, Gabrg3, Ppp1r1b, Penk, Grid2, Epha6, Grm1, Grm5, Ror1 
Negative:  Tcf4, Pde4d, Slc8a1, Dscam, Adcy2, Rgs6, Sox5, Gabbr2, Nav3, Pde1a 
       Chrm3, Slc17a7, Slc1a2, Sv2b, Astn2, Kcnj3, Prickle1, Oprm1, Mef2c, Pcdh15 
       Efna5, Sh3gl2, Plcb4, Adcy8, Hecw1, Prickle2, Dmd, Ptk2, Nrcam, Camk2g 
PC_ 4 
Positive:  Flt1, Cfh, Tgfbr1, Igfbp7, Csf1r, Hexb, Selplg, Pltp, St3gal6, Cldn5 
       Ptprc, Bsg, Rgs5, Cx3cr1, Fn1, Csf3r, Prkg1, Id1, Emcn, Ctss 
       Slc2a1, Tmsb4x, Vtn, Pde3b, Rhobtb1, Pecam1, Slc7a5, Mef2c, Serinc3, Pdgfd 
Negative:  Ncam1, Phlpp1, Mdga2, Grm3, Gnao1, Grid2, Daam2, Gpm6b, Prickle2, Mapt 
       Mbp, Mobp, Myo6, Clasp2, Kcnq3, Ctnna2, Mog, Aspa, Ugt8a, Unc5c 
       Rab31, Dnm3, Dscaml1, Mag, Bcas1, Fgfr2, Nfasc, Prkn, Gphn, Plp1 
PC_ 5 
Positive:  Malat1, Cacna2d3, Nrg1, Gpm6b, Grm3, Slc1a2, Pde10a, Camk4, Homer1, Epha4 
       Itpr1, Lrrk2, Rora, Efna5, Kcnq3, Nrgn, Arpp21, Cnksr2, Ntm, Spock1 
       Vipr1, Sv2b, Grm7, Rims1, Ppp3ca, Adgrl2, Prkcb, Hecw1, Atp2b2, Camk2a 
Negative:  Ptpro, Grip1, Lgr5, Synpr, Gad1, Gab2, Dner, Vipr2, Camk2d, Tiam1 
       Tgfbr1, Mef2a, Dgkg, Epha6, Grik1, Dscaml1, Apc, Csf1r, Hdac9, Tacr1 
       Prkca, Hexb, Pde3b, Selplg, Robo1, Ssh2, Cfh, Gad2, Sall1, Tcf4 
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
10:44:33 UMAP embedding parameters a = 0.9922 b = 1.112
10:44:33 Read 97930 rows and found 30 numeric columns
10:44:33 Using Annoy for neighbor search, n_neighbors = 30
10:44:33 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
10:45:18 Writing NN index file to temp file /local/scratch/rrbutler/slrmtmp.42912682/RtmpMN3W0T/file17c077b3c338
10:45:18 Searching Annoy index using 32 threads, search_k = 3000
10:45:40 Annoy recall = 100%
10:45:41 Commencing smooth kNN distance calibration using 32 threads with target n_neighbors = 30
10:45:46 Initializing from normalized Laplacian + noise (using RSpectra)
10:55:34 Commencing optimization for 200 epochs, with 4714268 positive edges
Using method 'umap'
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
10:59:36 Optimization finished
Warning message:
In PrepDR(object = object, features = features, verbose = verbose) :
  The following 22 features requested have not been scaled (running reduction without them): Aph1b/c, Bex1/2, C1s1/2, Clec7a, Ifng, Lyz1/2, Avpr1b, Nlgn4l, Lilrb4a/b, Iapp, Tuba1a/b/c, Cxcl2, NegPrb1, NegPrb2, NegPrb3, NegPrb4, NegPrb5, NegPrb6, NegPrb7, NegPrb8, NegPrb9, NegPrb10
[1] "Finding anchors and transfering..."
Warning: No layers found matching search pattern provided
Projecting cell embeddings
Finding neighborhoods
Finding anchors
    Found 25 anchors
[1] "the k.weight is set to 5 owing to fewer than 50 anchors"
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Predicting cell labels
Error in data.frame(predicted.id = prediction.ids, prediction.score = as.matrix(prediction.scores),  : 
  arguments imply differing number of rows: 0, 140286
Calls: MapQuery -> exec -> TransferData -> data.frame
Execution halted
rsatija commented 3 months ago

The underlying issue here is that you are finding a very small number of anchors. It’s hard to know exactly why this is (low overlap, feature set, etc.) - but the datasets may be challenging to integrate. One option is to perform findtransferanchors with a CCA reduction, which you can perform by setting reference.reduction=‘cca’. This is often how we integrate spatial and single cell data, and may be helpful in your case.