scverse / squidpy

Spatial Single Cell Analysis in Python
https://squidpy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
400 stars 70 forks source link

clustering accounting for spatial coordinates #13

Open giovp opened 3 years ago

giovp commented 3 years ago

Not very clear idea, but something along these lines: https://www.biorxiv.org/content/10.1101/2020.09.04.283812v1 Maybe a way to achieve similar results without explicit modelling and inference. It's essentially a smoothing of cluster assignments on spatial coordinates.

SabrinaRichter commented 3 years ago

Are you working on a method that creates some kind of adjacency matrices for the seqfish data? So especially split by 'Field of View'? So actually like 6 or 7 adjacency matrices?

giovp commented 3 years ago

@Koncopd is working on them!

giovp commented 3 years ago

@SabrinaRichter https://leidenalg.readthedocs.io/en/stable/reference.html#leidenalg.find_partition_multiplex

giovp commented 3 years ago

Still mixed feelings about this, let's keep it open

giovp commented 2 years ago

related to #246 and https://github.com/theislab/scanpy/issues/1818

giovp commented 2 years ago

I'm still quite tempted to add this although only use case I see is when the spatial graph is not a grid (but has some interesting topology). also, this should probably be in scanpy (or muon ? ).

SabrinaRichter commented 2 years ago

The idea was to include node feature information into the clustering, right? Then it could also be interesting for grid graphs, no? only question is whether people are interested in spatial pieces/clusters of homogeneous cell type patterns

giovp commented 2 years ago

mmh that could also be a way to do it but in theislab/scanpy#1818 the idea is to do multiplex partitioning with the knn from gexp and spatial graph jointly (without considering the node features). in case of features yes (could be image features?) and it would be interesting nonetheless (and even doable by doing joint partitioning of knn from gexp and image features).

ivirshup commented 2 years ago

What do you want to achieve by including spatial information in the clustering? I can think of two reasons to do this:

  1. You want to separate cell types which are not near each other into separate categories
  2. You want to make it more likely for nearby cells to be a part of cluster. E.g. loosen the similarity criteria for nearby cells.

I see a obvious use cases for 1, but I'm not sure you need a clustering for this. You should just be able to break up your non-spatial clustering results by finding connected components in the spatial graph. This would be like:

Example
setup Just getting to an AnnData I can do stuff with ```python import scanpy as sc import squidpy as sq import numpy as np, pandas as pd from scipy import sparse import seaborn as sns from matplotlib import pyplot as plt plt.rcParams["figure.figsize"] = (12, 8) adata = sc.datasets.visium_sge("V1_Breast_Cancer_Block_A_Section_1") adata.var_names_make_unique() adata.var["mito"] = adata.var_names.str.startswith("MT-") sc.pp.calculate_qc_metrics(adata, qc_vars=["mito"], inplace=True) sc.pp.filter_genes(adata, min_counts=1) adata.layers["counts"] = adata.X.copy() sc.pp.normalize_total(adata) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata, flavor="seurat_v3", layer="counts", n_top_genes=1000) sc.pp.pca(adata) sc.pp.neighbors(adata) ```
Subsetting clusters by spatial neighbor ```python sq.gr.spatial_neighbors(adata) sc.tl.leiden(adata, resolution=0.5) def find_per_cluster_components(adata, obs_key, graph_key): clusters = adata.obs[obs_key].astype("category") graph = adata.obsp[graph_key] components = -np.ones(adata.n_obs, dtype=int) new_labels = pd.DataFrame({"cluster": clusters, "component": np.zeros(adata.n_obs, dtype=int)}) for k, indices in adata.obs.groupby(obs_key).indices.items(): components[indices] = sparse.csgraph.connected_components(adata[indices].obsp[graph_key])[1] new_labels = pd.DataFrame({"cluster": clusters, "components": components}) return new_labels df = find_per_cluster_components(adata, "leiden", "spatial_connectivities") # Kinda gross subgroups = pd.Series(-np.ones(adata.n_obs, dtype=int), index=adata.obs_names) subgroups.loc[adata.obs.query("leiden == '6'").index] = df["components"].loc[adata.obs.query("leiden == '6'").index] adata.obs["to_plot"] = pd.Categorical.from_codes(codes=subgroups, categories=[str(x) for x in range(subgroups.max() + 1)]) ```
One selected cluster, split by connected components on the spatial graph. ![image](https://user-images.githubusercontent.com/8238804/148926624-fb749bb9-63ab-4144-86d4-f83a721879a4.png)

I'm not so sure how useful 2 is, but I could definitely be missing something.

giovp commented 2 years ago

that's really cool @ivirshup ! it'd be a very handy function.

re 2. , I think it's still be useful and would be a purely "data driven" (not necessarily better) way to achieve 1. That'd be done with multi-graph partitioning (native in leidenalg) where the knngraph from gexp and the spatial graphs are inputted. This is particulary useful for non-visium data where the graph actually has an interesting topology.

ivirshup commented 2 years ago

re 2., I think it's still be useful and would be a purely "data driven" (not necessarily better) way to achieve 1.

Could this problem also be thought of as "expression driven segmentation"?

I'm just a little unsure of the case where you want an output like the second plot, but without knowing those were the same cell types. Unless there's a case where you'd find something that looks different?

LLehner commented 3 weeks ago

Some methods for spatial clustering are now being implemented in PR#831. Potential methods are discussed in Issue#789.