Closed yiwang12 closed 8 months ago
Hi Yi, the answer to this question relies on the fact that BANKSY constructs an embedding that is the (direct) product of cells' own and neighbourhood expression spaces, and not the average of a set of cells in a neighbourhood, as is often assumed in other methods. In what follows, I will expound on this.
For each cell, we are concatenating a vector of its own expression features to one comprising the features corresponding to the mean (and AGF) of the cells in its neighbourhood. Now suppose you have two cell types that are perfectly intermingled in a region in space. This means that they will have the same neighbourhood vector. But their own expression vectors will still be completely their own, and therefore distinct. So in the neighbour augmented embedding space, these cells will sit in very different places, and the clusterig algorithm will continue to label them as distinct.
The following figure (Ext. Data Fig 1b) will help illustrate this. Here, the blue and green cells in either neighbourhood share the same location on the vertical (neighbourhood) axis, but are distinct on the own expression axis. In the schematic, we are using one axis apeice to represent the entire own expression and neighbourhood expression spaces, respectively, though in reality, each of these spaces is spanned by as many coordinate axes as there are genes.
You can also read Supp. Section 3 in the SI file, and look at Supp. Fig 17 (partly reproduced here). In this figure, we have a couple of examples of cell types that are spatially intermingled, but perfectly separable at lambda = 0.2, for the reason explained above.
BANSKY actually goes a lot further than what I have explained above. At lambda = 0.2, it is able to leverage this separation of the own and neighbourhood axes to de-noise noisy data (Fig. 1c, Ext Data Fig 1a for conceptual explanation, and Fig 2 for demonstation) and to separate cells of the same type that are in different neighborhoods and have subtly different own expressions (Ext Data Fig. 1b for concept, and Fig. 3b-e and Fig. 4a-q for demonstration). Feel free to have a look and let me know if you need to discuss these too.
PS: we have found that lambda = 0.2 works well for most datasets, but ultimately it's just like any other parameter in single cell analysis: you have to try a couple of values (lambda 0.15 and 0.2 are best to try) to see what makes sense biologically / in terms of marker genes / spatial distributions of cells. Similarly, for domain segmentation, anything between 0.75 and 0.9 should work.. see Supp. Fig. 26, for example.
Hi Vipul, I see. Thanks a lot for your explanations! It's very helpful.
Hi Authors of Banksy,
I'm using Banksy for analyzing a single-cell spatial data. This method is very useful in analyzing our data.
I saw in the tutorial saying that " lambda=0.2 corresponding to BANKSY for cell-typing".
My own thought is that the cell-typing shouldn't use the spatial information at all, as there could be different cell types that are spatially co-localized. So my own understand is that lambda=0 should be the right parameter for the cell type annotation.
I'm wondering why should lambda=0.2 be used for cell typing in BANKSY?
Thank you, Yi