gustaveroussy / sopa

Technology-invariant pipeline for spatial omics analysis (Xenium / Visium HD / MERSCOPE / CosMx / PhenoCycler / MACSima / ...) that scales to millions of cells
https://gustaveroussy.github.io/sopa/
BSD 3-Clause "New" or "Revised" License
123 stars 14 forks source link

Questions about Spatial Statistics Analysis #48

Closed wikk-chy closed 5 months ago

wikk-chy commented 5 months ago

Hello! Thank you for your contributions to spatial group data analysis. I have some questions regarding "Distances between cell categories". When the number of certain cell types or niches is small, the mean hop-distance to other cell types or niches tends to increase significantly. In comparing experimental and control groups, can I draw conclusions about distance increase or decrease based solely on mean hop-distance, or should I also consider the issue of object quantity? I look forward to your response.

wikk-chy commented 5 months ago

For example, in such figure, may I get the conclusion that after being infected, the distance between TH cells and other cells increases similarly? image

quentinblampey commented 5 months ago

Hello @wikk-chy,

Indeed, it's important to consider the number of cells when you have a rare cell type! You can look at distributions of distances instead of looking at the mean distance (and also perform statistical tests based on the two distributions). For instance, the code below computes the distance between each cell and all cell types.

sopa.spatial.cells_to_groups(adata, "cell_type", key_added_prefix="distance_")

adata.obsm['distance_cell_type'] # contains a dataframe of distances for each cell

Then, you can subset this dataframe according to your cell type of interest, and plot the corresponding distribution. For instance, adata[adata.obs.cell_type == "B_cell"].obsm['distance_cell_type'] will give you all the distances between B cells and all other cell types.

The cells_to_groups API documentation is here. Hope this helps!

wikk-chy commented 5 months ago

Thank you very much for your response, it's very helpful to me. However, I found that there are some NaNs in the results when running the analysis, and I'm not quite sure what caused this. image

quentinblampey commented 5 months ago

The distance is based on the number of "hops", i.e. the number of edges in the Delaunay graph. Therefore, you can have NAs if your graph is not fully connected (this can happen if you have regions that are very sparse, with some cells very far from any other cell). It simply means that there is no path between the "source" cell and the "target" cell-type.

You can increase the radius threshold to 100 microns as below, this way it will connect much more cells and you will have less NAs:

sopa.spatial.spatial_neighbors(adata, radius=[0, 100])

You can also choose a much higher threshold, but then you may have some edges that are not biologically relevant.