prabhakarlab / Banksy

BANKSY: spatial clustering
https://prabhakarlab.github.io/Banksy
Other
67 stars 12 forks source link

Error: No images present in this Seurat object #15

Closed JLiLab closed 7 months ago

JLiLab commented 7 months ago

Thank you for creating and sharing this method.

While implementing it, I encountered an error stating "No images present in the Seurat object." It seems to me that the algorithm does not require an image for its operation. Is there a way to bypass or suppress this error to continue the analysis?

Furthermore, I am interested in your recommendations for processing a Seurat dataset that includes serial sections. It appears the current algorithm does not incorporate Z-axis information, suggesting a need to isolate and analyze each section independently.

Thank you for your assistance and insights.

James

jleechung commented 7 months ago

Hi @JLiLab, thanks for your interest in BANKSY and for raising this issue.

Regarding the error: I've not seen this before - would you be able to provide a minimal reproducible example that throws this error?

Regarding z-axis information: if you are using BANKSY with SingleCellExperiment objects, you can provide the column names corresponding to the x, y and z coordinates to the coord_names argument in the computeBanksy function.

If you are using BANKSY with SeuratObject, could you reinstall our fork of SeuratWrappers with:

remotes::install_github('jleechung/seurat-wrappers@feat-aft')

I've included a simple example illustrating the BANKSY-Seurat workflow on 3D mouse visual cortex data - you can try adapting this to your own data.

library(Banksy)
library(Seurat)
library(SeuratWrappers)
library(Matrix)

# links to files: change as required
loc_path = '~/Downloads/d6cf80ab84578ce02d7bfb4c131f2126/cell_info.csv.gz'
gcm_path = '~/Downloads/d6cf80ab84578ce02d7bfb4c131f2126/cell_exp.mtx.gz'

# read data
loc = read.csv(loc_path)
gcm = t(readMM(gcm_path))
cell_names = paste0('cell_', seq_len(ncol(gcm)))
colnames(gcm) = cell_names
rownames(loc) = cell_names

> head(loc)
       cell_x cell_y cell_z
cell_1     11    609     15
cell_2     12    937      8
cell_3     14   1144      6
cell_4     15    535      7
cell_5     19    954      6
cell_6     21    324      8

> head(gcm[,1:5])
6 x 5 sparse Matrix of class "dgTMatrix"
       cell_1   cell_2   cell_3   cell_4   cell_5
[1,] 2.058762 2.428399 2.416706 2.196665 2.447973
[2,] 1.925406 1.994583 2.204485 1.902174 2.019398
[3,] 2.254643 1.990715 2.467762 1.929102 1.840283
[4,] 2.165417 2.236728 2.124380 2.330242 2.242907
[5,] 2.061738 2.284070 2.139607 2.108556 2.197077
[6,] 2.215741 2.149931 2.079938 2.226619 2.188833

# preprocess the data
seu = CreateSeuratObject(counts = gcm, meta.data = data.frame(loc))
seu = NormalizeData(seu, scale.factor = median(colSums(gcm)))
seu = ScaleData(seu)

# run BANKSY - provide the col. names of the coordinate dimensions to dimx, dimy, dimz
seu = RunBanksy(seu, lambda = 0.2,
                dimx = 'cell_x', dimy = 'cell_y', dimz = 'cell_z',
                assay = 'RNA', slot = 'data', features = 'all', k_geom = 15)

# PCA
seu = RunPCA(seu, assay = 'BANKSY', features = rownames(seu), npcs = 10)

# cluster
seu = FindNeighbors(seu, dims = 1:10)
seu = FindClusters(seu, resolution = 1)

# visualise x and y coordinates, with z coordinate represent by point size
ggplot(seu@meta.data,
       aes(x=cell_x, y=cell_y, col=BANKSY_snn_res.1, size=cell_z)) +
  geom_point() +
  scale_size(range = c(0.1,0.5)) 
```r R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.2.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: Europe/London tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_3.4.4 Matrix_1.6-5 SeuratWrappers_0.3.4 Seurat_5.0.1 [5] SeuratObject_5.0.1 sp_2.1-3 Banksy_0.99.7 loaded via a namespace (and not attached): [1] RcppHungarian_0.3 RcppAnnoy_0.0.22 splines_4.3.2 [4] later_1.3.2 bitops_1.0-7 tibble_3.2.1 [7] R.oo_1.26.0 polyclip_1.10-6 fastDummies_1.7.3 [10] lifecycle_1.0.4 aricode_1.0.3 globals_0.16.2 [13] lattice_0.22-5 MASS_7.3-60.0.1 magrittr_2.0.3 [16] plotly_4.10.4 rmarkdown_2.25 yaml_2.3.8 [19] remotes_2.4.2.1 httpuv_1.6.14 sctransform_0.4.1 [22] spam_2.10-0 spatstat.sparse_3.0-3 reticulate_1.35.0 [25] cowplot_1.1.3 pbapply_1.7-2 RColorBrewer_1.1-3 [28] abind_1.4-5 zlibbioc_1.48.0 Rtsne_0.17 [31] GenomicRanges_1.54.1 purrr_1.0.2 R.utils_2.12.3 [34] BiocGenerics_0.48.1 RCurl_1.98-1.14 GenomeInfoDbData_1.2.11 [37] IRanges_2.36.0 S4Vectors_0.40.2 ggrepel_0.9.5 [40] irlba_2.3.5.1 listenv_0.9.1 spatstat.utils_3.0-4 [43] goftest_1.2-3 RSpectra_0.16-1 spatstat.random_3.2-2 [46] fitdistrplus_1.1-11 parallelly_1.37.0 leiden_0.4.3.1 [49] codetools_0.2-19 DelayedArray_0.28.0 tidyselect_1.2.0 [52] farver_2.1.1 matrixStats_1.2.0 stats4_4.3.2 [55] spatstat.explore_3.2-6 jsonlite_1.8.8 ellipsis_0.3.2 [58] progressr_0.14.0 ggridges_0.5.6 survival_3.5-7 [61] dbscan_1.1-12 tools_4.3.2 ica_1.0-3 [64] Rcpp_1.0.12 glue_1.7.0 gridExtra_2.3 [67] SparseArray_1.2.4 xfun_0.42 MatrixGenerics_1.14.0 [70] GenomeInfoDb_1.38.6 dplyr_1.1.4 withr_3.0.0 [73] BiocManager_1.30.22 fastmap_1.1.1 fansi_1.0.6 [76] digest_0.6.34 rsvd_1.0.5 R6_2.5.1 [79] mime_0.12 colorspace_2.1-0 scattermore_1.2 [82] sccore_1.0.4 tensor_1.5 spatstat.data_3.0-4 [85] R.methodsS3_1.8.2 utf8_1.2.4 tidyr_1.3.1 [88] generics_0.1.3 data.table_1.15.0 httr_1.4.7 [91] htmlwidgets_1.6.4 S4Arrays_1.2.0 uwot_0.1.16 [94] pkgconfig_2.0.3 gtable_0.3.4 lmtest_0.9-40 [97] SingleCellExperiment_1.24.0 XVector_0.42.0 htmltools_0.5.7 [100] dotCall64_1.1-1 scales_1.3.0 Biobase_2.62.0 [103] png_0.1-8 SpatialExperiment_1.12.0 knitr_1.45 [106] rstudioapi_0.15.0 reshape2_1.4.4 rjson_0.2.21 [109] nlme_3.1-164 zoo_1.8-12 stringr_1.5.1 [112] KernSmooth_2.23-22 parallel_4.3.2 miniUI_0.1.1.1 [115] pillar_1.9.0 grid_4.3.2 vctrs_0.6.5 [118] RANN_2.6.1 promises_1.2.1 xtable_1.8-4 [121] cluster_2.1.6 evaluate_0.23 magick_2.8.2 [124] cli_3.6.2 compiler_4.3.2 rlang_1.1.3 [127] crayon_1.5.2 future.apply_1.11.1 labeling_0.4.3 [130] mclust_6.0.1 plyr_1.8.9 stringi_1.8.3 [133] viridisLite_0.4.2 deldir_2.0-2 munsell_0.5.0 [136] lazyeval_0.2.2 spatstat.geom_3.2-8 RcppHNSW_0.6.0 [139] patchwork_1.2.0 future_1.33.1 shiny_1.8.0 [142] SummarizedExperiment_1.32.0 ROCR_1.0-11 leidenAlg_1.1.2 [145] igraph_2.0.1.1 ```
JLiLab commented 7 months ago

Thank you for your prompt response. The instructions you provided worked well. I have a few additional questions for further clarification:

  1. Please correct me if my understanding is incorrect. I believe the algorithm processes each section independently, without considering cell clusters in adjacent sections. Consequently, the accuracy of slice registration along the x and y axes should not influence the outcome. Is this correct?
  2. Would it be feasible to incorporate spatial information from cell clusters in neighboring slices into the analysis?
  3. Could you advise on the best approach to integrate data from two different samples?

I appreciate your insights and look forward to your advice.

James

jleechung commented 7 months ago

Hi James, when analysing multiple slices / sections, BANKSY indeed computes the neighbourhood-augmented features for each slice independently. Importantly, you'll have to make sure that the spatial coordinates of the different sections do not overlap.

Once that's done, the BANKSY matrices (own expression + neighbourhood features) from different slices are then concatenated downstream for joint dimensionality reduction and clustering. We've written a vignette here detailing how that can be done.

When there are batch effects present between samples, BANKSY can be used with integration methods such as Harmony for spatially-informed batch effect correction. Refer to this vignette here for a possible workflow.

Another consideration for multi-sample or integrative analysis is the feature set used. You may want to experiment with how you select features - for instance, computing highly variable features for each sample individually, and then taking the union / intersection of those genes.

I've included a Seurat-compatible workflow below, demonstrating BANKSY multi-sample analysis on human dorsolateral prefrontal cortex with batch effects between subjects, in case that's more applicable for you.

Multi-sample analysis / Integration with Harmony ```r library(spatialLIBD) library(ExperimentHub) library(Banksy) library(Seurat) library(SeuratWrappers) library(harmony) # download data ehub = ExperimentHub::ExperimentHub() spe = spatialLIBD::fetch_data(type = "spe", eh = ehub) # subset to three samples from three different subjects spe = spe[, spe$sample_id %in% c('151507', '151669', '151673')] gc() # extract feature matrix, coordinates, metadata gcm = assays(spe)$counts colnames(gcm) = make.names(colnames(gcm), unique = TRUE) loc = spatialCoords(spe) mdata = data.frame(colData(spe)[, c('sample_id', 'subject', 'spatialLIBD')]) rm(spe); rm(ehub) # important: stagger the spatial coords. and ensure they do not overlap maxx = max(loc[,1]) * 1.1 n_samples = length(unique(mdata$sample_id)) shift = seq(from = 0, length.out = n_samples, by = maxx) loc[,1] = loc[,1] + rep(shift, table(mdata$sample_id)) colnames(loc) = c('x_coord', 'y_coord') # plot the staggered coords. plot(loc, cex=0.2, col = factor(mdata$spatialLIBD), main = 'cortical layer') plot(loc, cex=0.2, col = factor(mdata$subject), main = 'subject') # initialise seu = CreateSeuratObject(counts = gcm, meta.data = cbind(mdata, loc)) seu = NormalizeData(seu, scale.factor = median(colSums(gcm))) seu = FindVariableFeatures(seu) # run BANKSY seu = RunBanksy(seu, lambda = 0.2, verbose=TRUE, dimx = 'x_coord', dimy = 'y_coord', assay = 'RNA', slot = 'data', features = 'variable', k_geom = 18) # PCA seu = RunPCA(seu, assay = 'BANKSY', features = rownames(seu), npcs = 30, reduction.name = 'banksy_pca') # integration with harmony across subjects # skip this if negligible batch effects seu = RunHarmony(seu, group.by.vars = 'subject', reduction = 'banksy_pca', reduction.save = 'banksy_harmony') # compute UMAPs and visualise seu = RunUMAP(seu, dims = 1:30, reduction = 'banksy_pca', reduction.name = 'umap_pca') seu = RunUMAP(seu, dims = 1:30, reduction = 'banksy_harmony', reduction.name = 'umap_harmony') # skip this if harmony not run # visualise UMAPs DimPlot(seu, group.by='subject', reduction = 'umap_pca') # separated by subject DimPlot(seu, group.by='subject', reduction = 'umap_harmony') # cluster the spatially-informed integrated BANKSY-Harmony embedding seu = FindNeighbors(seu, dims = 1:30, reduction = 'banksy_harmony') seu = FindClusters(seu, resolution = 0.2) # visualise the clusters across slices DimPlot(seu, group.by=c('subject', 'seurat_clusters'), reduction = 'umap_harmony') ```
Session info ```r R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.2.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: Europe/London tzcode source: internal attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] harmony_1.2.0 Rcpp_1.0.12 [3] SeuratWrappers_0.3.4 Seurat_5.0.1 [5] SeuratObject_5.0.1 sp_2.1-3 [7] Banksy_0.99.7 ExperimentHub_2.10.0 [9] AnnotationHub_3.10.0 BiocFileCache_2.10.1 [11] dbplyr_2.4.0 spatialLIBD_1.14.1 [13] SpatialExperiment_1.12.0 SingleCellExperiment_1.24.0 [15] SummarizedExperiment_1.32.0 Biobase_2.62.0 [17] GenomicRanges_1.54.1 GenomeInfoDb_1.38.6 [19] IRanges_2.36.0 S4Vectors_0.40.2 [21] BiocGenerics_0.48.1 MatrixGenerics_1.14.0 [23] matrixStats_1.2.0 loaded via a namespace (and not attached): [1] spatstat.sparse_3.0-3 bitops_1.0-7 [3] httr_1.4.7 RColorBrewer_1.1-3 [5] doParallel_1.0.17 tools_4.3.2 [7] sctransform_0.4.1 utf8_1.2.4 [9] R6_2.5.1 DT_0.31 [11] lazyeval_0.2.2 uwot_0.1.16 [13] withr_3.0.0 gridExtra_2.3 [15] progressr_0.14.0 cli_3.6.2 [17] spatstat.explore_3.2-6 fastDummies_1.7.3 [19] labeling_0.4.3 sass_0.4.8 [21] spatstat.data_3.0-4 ggridges_0.5.6 [23] pbapply_1.7-2 Rsamtools_2.18.0 [25] dbscan_1.1-12 R.utils_2.12.3 [27] aricode_1.0.3 scater_1.30.1 [29] parallelly_1.37.0 sessioninfo_1.2.2 [31] attempt_0.3.1 maps_3.4.2 [33] limma_3.58.1 rstudioapi_0.15.0 [35] RSQLite_2.3.5 generics_0.1.3 [37] BiocIO_1.12.0 spatstat.random_3.2-2 [39] ica_1.0-3 dplyr_1.1.4 [41] Matrix_1.6-5 ggbeeswarm_0.7.2 [43] fansi_1.0.6 abind_1.4-5 [45] R.methodsS3_1.8.2 lifecycle_1.0.4 [47] yaml_2.3.8 edgeR_4.0.15 [49] SparseArray_1.2.4 Rtsne_0.17 [51] paletteer_1.6.0 grid_4.3.2 [53] blob_1.2.4 promises_1.2.1 [55] crayon_1.5.2 miniUI_0.1.1.1 [57] lattice_0.22-5 beachmat_2.18.0 [59] cowplot_1.1.3 KEGGREST_1.42.0 [61] magick_2.8.2 pillar_1.9.0 [63] rjson_0.2.21 future.apply_1.11.1 [65] codetools_0.2-19 leiden_0.4.3.1 [67] glue_1.7.0 data.table_1.15.0 [69] remotes_2.4.2.1 vctrs_0.6.5 [71] png_0.1-8 spam_2.10-0 [73] gtable_0.3.4 rematch2_2.1.2 [75] cachem_1.0.8 S4Arrays_1.2.0 [77] mime_0.12 survival_3.5-7 [79] RcppHungarian_0.3 iterators_1.0.14 [81] fields_15.2 statmod_1.5.0 [83] interactiveDisplayBase_1.40.0 ellipsis_0.3.2 [85] fitdistrplus_1.1-11 ROCR_1.0-11 [87] nlme_3.1-164 bit64_4.0.5 [89] filelock_1.0.3 RcppAnnoy_0.0.22 [91] bslib_0.6.1 irlba_2.3.5.1 [93] vipor_0.4.7 KernSmooth_2.23-22 [95] colorspace_2.1-0 DBI_1.2.1 [97] tidyselect_1.2.0 bit_4.0.5 [99] compiler_4.3.2 curl_5.2.0 [101] BiocNeighbors_1.20.2 DelayedArray_0.28.0 [103] plotly_4.10.4 rtracklayer_1.62.0 [105] scales_1.3.0 lmtest_0.9-40 [107] rappdirs_0.3.3 goftest_1.2-3 [109] stringr_1.5.1 digest_0.6.34 [111] spatstat.utils_3.0-4 benchmarkmeData_1.0.4 [113] RhpcBLASctl_0.23-42 XVector_0.42.0 [115] htmltools_0.5.7 pkgconfig_2.0.3 [117] sparseMatrixStats_1.14.0 fastmap_1.1.1 [119] rlang_1.1.3 htmlwidgets_1.6.4 [121] shiny_1.8.0 DelayedMatrixStats_1.24.0 [123] farver_2.1.1 jquerylib_0.1.4 [125] zoo_1.8-12 jsonlite_1.8.8 [127] BiocParallel_1.36.0 mclust_6.0.1 [129] R.oo_1.26.0 config_0.3.2 [131] BiocSingular_1.18.0 RCurl_1.98-1.14 [133] magrittr_2.0.3 scuttle_1.12.0 [135] GenomeInfoDbData_1.2.11 dotCall64_1.1-1 [137] patchwork_1.2.0 munsell_0.5.0 [139] viridis_0.6.5 reticulate_1.35.0 [141] leidenAlg_1.1.2 stringi_1.8.3 [143] zlibbioc_1.48.0 MASS_7.3-60.0.1 [145] plyr_1.8.9 parallel_4.3.2 [147] listenv_0.9.1 ggrepel_0.9.5 [149] deldir_2.0-2 Biostrings_2.70.2 [151] sccore_1.0.4 splines_4.3.2 [153] tensor_1.5 locfit_1.5-9.8 [155] igraph_2.0.1.1 spatstat.geom_3.2-8 [157] RcppHNSW_0.6.0 reshape2_1.4.4 [159] ScaledMatrix_1.10.0 BiocVersion_3.18.1 [161] XML_3.99-0.16.1 golem_0.4.1 [163] BiocManager_1.30.22 foreach_1.5.2 [165] httpuv_1.6.14 polyclip_1.10-6 [167] RANN_2.6.1 tidyr_1.3.1 [169] purrr_1.0.2 future_1.33.1 [171] benchmarkme_1.0.8 scattermore_1.2 [173] ggplot2_3.4.4 rsvd_1.0.5 [175] xtable_1.8-4 restfulr_0.0.15 [177] RSpectra_0.16-1 later_1.3.2 [179] viridisLite_0.4.2 tibble_3.2.1 [181] memoise_2.0.1 beeswarm_0.4.0 [183] AnnotationDbi_1.64.1 GenomicAlignments_1.38.2 [185] cluster_2.1.6 shinyWidgets_0.8.1 [187] globals_0.16.2 ```
JLiLab commented 7 months ago

We have aligned the serial sections so that identical cell types/clusters across these sections share similar x and y coordinates. However, the spatial coordinates for these sections vary in the Z dimension. Are you saying that we should adjust the x and y coordinates to distinguish between different sections?

One potential problem we have encountered using BANKSY is that it identifies the same cell type differently on the left and right sides of the brain with a lamda at 0.2. Do you have any suggestion?

jleechung commented 7 months ago

If you're able to accurately register the x and y coordinates of the sections, then clustering with all three dimensions should probably be fine. However, if you expect some technical variability between the sections, then staggering the x and y coordinates and analysing the sections as different samples might work better.

That is unusual - we've had success clustering mouse hypothalamus with left-right symmetry, and across three dimensions as well (see Fig. 3a of the paper for instance). Does conventional clustering assign the same cell types on the left and right sides of the brain to the same cluster? Is there any technical variability along the left-right axis, such as a gradient in the number of detected genes or total transcript count? Are there any noticeable differences in the neighbourhood of those cells? You could also check if those cells co-locate in lower-dimensional PC or UMAP space.

vipulsinghal02 commented 7 months ago

Hi James, just to add to Joseph's comments:

When the z coordinate is available, the k_geom nearest neighbours are computed using all 3 coordinates. The mean expression (middle submatrix in the 3-block neighbour augmented matrix) is populated with means computed using neighbours found using all three coordinates, so if you have 3D data, you can set use_agf = FALSE to set up a neighbour augmented matrix with only own and neighbourhood mean expressions, and it will fully utilize the 3D data without any conceptual distortions. We find that this still gives very good results on both cell typing and domain segmentation tasks.

What happens with the AGF in vertically aligned 3D data is as you guys already hinted at above: the nearest neighbours are computed in 3D, and then only the (x,y) coords of the nearest neighbours are used in computing the AGF values. This is equivalent to projecting the cells from nearby z-planes to the index cell's z plane, and then performing a 2D AGF. This is not ideal (though not terrible either), and we are looking into how to generalize the AGF to 3 dimensions using spherical harmonics, so stay tuned for that potential extension of the method.

As Joseph has mentioned above, one alternative way to use the AGF with 3D data is so treat each slice as a separate dataset, and then concatenate the matrices, as you would do with multi-dataset workflows. Yet another way is to place offsets in the x-y coords in each slice, so that none of the slices overlap in x-y coordinates, so that the nearest neighbours are always forced to come from the same slice. As you guys noted, both of these options are less than ideal, because some of the K nearest neighbours can certainly come from adjacent slices.

I would say, if you have an aligned stack of z planes, try using use_agf = FALSE, and see how it does.

JLiLab commented 7 months ago

Thank you for the information. I will give them a try. I can see how use_agf = FASLE is used in computeBanksy - runBanksyPCA - clusterBanksy using SummarizedExperiment. However, it is unclear how use_agf = F is incorporated into the Seurat workflow.

jleechung commented 7 months ago

Hi James, by default, the Seurat workflow runs BANKSY without the AGF. This was controlled by the M argument denoting the highest harmonic to use (setting M=0 corresponds to use_agf=FALSE, while setting M=1 corresponds to use_agf=TRUE). For clarity, I've now introduced the use_agf argument to the Seurat workflow too. Thanks for your feedback.

JLiLab commented 7 months ago

Excellent! I will give it a try.

I noticed that using use_agf=FALSE works better for symmetric patterns (n =1 though), in which algorithms that incorporate spatial information to cluster cells often fail.

Using the Seurat pipeline, can we use FindSubCluster() function?

jleechung commented 7 months ago

Yes, just provide the graph name to the function. Check the names of the graphs with :

Seurat::Graphs(seu)

If you're following the vignette this should show BANKSY_nn and BANKSY_snn. Run sub-clustering on the cluster of your choice (below, on cluster 1) with the shared nearest neighbour graph:

FindSubCluster(seu, cluster = 1, graph.name = 'BANKSY_snn')