Closed JLiLab closed 8 months ago
Hi @JLiLab, thanks for your interest in BANKSY and for raising this issue.
Regarding the error: I've not seen this before - would you be able to provide a minimal reproducible example that throws this error?
Regarding z-axis information: if you are using BANKSY with SingleCellExperiment
objects, you can provide the column names corresponding to the x, y and z coordinates to the coord_names
argument in the computeBanksy
function.
If you are using BANKSY with SeuratObject
, could you reinstall our fork of SeuratWrappers
with:
remotes::install_github('jleechung/seurat-wrappers@feat-aft')
I've included a simple example illustrating the BANKSY-Seurat workflow on 3D mouse visual cortex data - you can try adapting this to your own data.
library(Banksy)
library(Seurat)
library(SeuratWrappers)
library(Matrix)
# links to files: change as required
loc_path = '~/Downloads/d6cf80ab84578ce02d7bfb4c131f2126/cell_info.csv.gz'
gcm_path = '~/Downloads/d6cf80ab84578ce02d7bfb4c131f2126/cell_exp.mtx.gz'
# read data
loc = read.csv(loc_path)
gcm = t(readMM(gcm_path))
cell_names = paste0('cell_', seq_len(ncol(gcm)))
colnames(gcm) = cell_names
rownames(loc) = cell_names
> head(loc)
cell_x cell_y cell_z
cell_1 11 609 15
cell_2 12 937 8
cell_3 14 1144 6
cell_4 15 535 7
cell_5 19 954 6
cell_6 21 324 8
> head(gcm[,1:5])
6 x 5 sparse Matrix of class "dgTMatrix"
cell_1 cell_2 cell_3 cell_4 cell_5
[1,] 2.058762 2.428399 2.416706 2.196665 2.447973
[2,] 1.925406 1.994583 2.204485 1.902174 2.019398
[3,] 2.254643 1.990715 2.467762 1.929102 1.840283
[4,] 2.165417 2.236728 2.124380 2.330242 2.242907
[5,] 2.061738 2.284070 2.139607 2.108556 2.197077
[6,] 2.215741 2.149931 2.079938 2.226619 2.188833
# preprocess the data
seu = CreateSeuratObject(counts = gcm, meta.data = data.frame(loc))
seu = NormalizeData(seu, scale.factor = median(colSums(gcm)))
seu = ScaleData(seu)
# run BANKSY - provide the col. names of the coordinate dimensions to dimx, dimy, dimz
seu = RunBanksy(seu, lambda = 0.2,
dimx = 'cell_x', dimy = 'cell_y', dimz = 'cell_z',
assay = 'RNA', slot = 'data', features = 'all', k_geom = 15)
# PCA
seu = RunPCA(seu, assay = 'BANKSY', features = rownames(seu), npcs = 10)
# cluster
seu = FindNeighbors(seu, dims = 1:10)
seu = FindClusters(seu, resolution = 1)
# visualise x and y coordinates, with z coordinate represent by point size
ggplot(seu@meta.data,
aes(x=cell_x, y=cell_y, col=BANKSY_snn_res.1, size=cell_z)) +
geom_point() +
scale_size(range = c(0.1,0.5))
Thank you for your prompt response. The instructions you provided worked well. I have a few additional questions for further clarification:
I appreciate your insights and look forward to your advice.
James
Hi James, when analysing multiple slices / sections, BANKSY indeed computes the neighbourhood-augmented features for each slice independently. Importantly, you'll have to make sure that the spatial coordinates of the different sections do not overlap.
Once that's done, the BANKSY matrices (own expression + neighbourhood features) from different slices are then concatenated downstream for joint dimensionality reduction and clustering. We've written a vignette here detailing how that can be done.
When there are batch effects present between samples, BANKSY can be used with integration methods such as Harmony for spatially-informed batch effect correction. Refer to this vignette here for a possible workflow.
Another consideration for multi-sample or integrative analysis is the feature set used. You may want to experiment with how you select features - for instance, computing highly variable features for each sample individually, and then taking the union / intersection of those genes.
I've included a Seurat-compatible workflow below, demonstrating BANKSY multi-sample analysis on human dorsolateral prefrontal cortex with batch effects between subjects, in case that's more applicable for you.
We have aligned the serial sections so that identical cell types/clusters across these sections share similar x and y coordinates. However, the spatial coordinates for these sections vary in the Z dimension. Are you saying that we should adjust the x and y coordinates to distinguish between different sections?
One potential problem we have encountered using BANKSY is that it identifies the same cell type differently on the left and right sides of the brain with a lamda at 0.2. Do you have any suggestion?
If you're able to accurately register the x and y coordinates of the sections, then clustering with all three dimensions should probably be fine. However, if you expect some technical variability between the sections, then staggering the x and y coordinates and analysing the sections as different samples might work better.
That is unusual - we've had success clustering mouse hypothalamus with left-right symmetry, and across three dimensions as well (see Fig. 3a of the paper for instance). Does conventional clustering assign the same cell types on the left and right sides of the brain to the same cluster? Is there any technical variability along the left-right axis, such as a gradient in the number of detected genes or total transcript count? Are there any noticeable differences in the neighbourhood of those cells? You could also check if those cells co-locate in lower-dimensional PC or UMAP space.
Hi James, just to add to Joseph's comments:
When the z coordinate is available, the k_geom nearest neighbours are computed using all 3 coordinates. The mean expression (middle submatrix in the 3-block neighbour augmented matrix) is populated with means computed using neighbours found using all three coordinates, so if you have 3D data, you can set use_agf = FALSE
to set up a neighbour augmented matrix with only own and neighbourhood mean expressions, and it will fully utilize the 3D data without any conceptual distortions. We find that this still gives very good results on both cell typing and domain segmentation tasks.
What happens with the AGF in vertically aligned 3D data is as you guys already hinted at above: the nearest neighbours are computed in 3D, and then only the (x,y) coords of the nearest neighbours are used in computing the AGF values. This is equivalent to projecting the cells from nearby z-planes to the index cell's z plane, and then performing a 2D AGF. This is not ideal (though not terrible either), and we are looking into how to generalize the AGF to 3 dimensions using spherical harmonics, so stay tuned for that potential extension of the method.
As Joseph has mentioned above, one alternative way to use the AGF with 3D data is so treat each slice as a separate dataset, and then concatenate the matrices, as you would do with multi-dataset workflows. Yet another way is to place offsets in the x-y coords in each slice, so that none of the slices overlap in x-y coordinates, so that the nearest neighbours are always forced to come from the same slice. As you guys noted, both of these options are less than ideal, because some of the K nearest neighbours can certainly come from adjacent slices.
I would say, if you have an aligned stack of z planes, try using use_agf = FALSE
, and see how it does.
Thank you for the information. I will give them a try. I can see how use_agf = FASLE is used in computeBanksy - runBanksyPCA - clusterBanksy using SummarizedExperiment. However, it is unclear how use_agf = F is incorporated into the Seurat workflow.
Hi James, by default, the Seurat workflow runs BANKSY without the AGF. This was controlled by the M
argument denoting the highest harmonic to use (setting M=0
corresponds to use_agf=FALSE
, while setting M=1
corresponds to use_agf=TRUE
). For clarity, I've now introduced the use_agf
argument to the Seurat workflow too. Thanks for your feedback.
Excellent! I will give it a try.
I noticed that using use_agf=FALSE works better for symmetric patterns (n =1 though), in which algorithms that incorporate spatial information to cluster cells often fail.
Using the Seurat pipeline, can we use FindSubCluster() function?
Yes, just provide the graph name to the function. Check the names of the graphs with :
Seurat::Graphs(seu)
If you're following the vignette this should show BANKSY_nn
and BANKSY_snn
. Run sub-clustering on the cluster of your choice (below, on cluster 1) with the shared nearest neighbour graph:
FindSubCluster(seu, cluster = 1, graph.name = 'BANKSY_snn')
Thank you for creating and sharing this method.
While implementing it, I encountered an error stating "No images present in the Seurat object." It seems to me that the algorithm does not require an image for its operation. Is there a way to bypass or suppress this error to continue the analysis?
Furthermore, I am interested in your recommendations for processing a Seurat dataset that includes serial sections. It appears the current algorithm does not incorporate Z-axis information, suggesting a need to isolate and analyze each section independently.
Thank you for your assistance and insights.
James