scverse / squidpy

Spatial Single Cell Analysis in Python
https://squidpy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
434 stars 79 forks source link

10X VISIUM integrated slides and spatial neighbors graph #318

Closed alberto-valdeolivas closed 2 years ago

alberto-valdeolivas commented 3 years ago

Hello. First of all, I would like to thank and congratulate you for this amazing package and its related manuscript.

I am working with several different 10X VISIUM slides from the same tissue and same condition. I am integrating the gene expression from all the slides using bbknn (as described in the Scanpy tutorial) in order to compute common clusters. I am now planning to run the neighborhood enrichment analyses. To do so, I need to first create a graph from the spatial coordinates using squidpy.gr.spatial_neighbors. I attached the slide name to every spot ID to avoid having duplicates. However, I am wondering if this function somehow deals with the coordinates, since in principle those are the same for every slide, right?

Thank you and best regards, Alberto

giovp commented 3 years ago

Hi @alberto-valdeolivas ,

thanks for the interest in Squidpy and the kind words! Indeed, as you mention, sq.gr.spatial_neighbors is not batch aware and should not be run in the joint adata object. The only way I could come up for something like you describe, is simply filtering the anndata by batch and compute spatial graph and nhood enrichment separately. e.g.

nhood_enrich_results = []
for batch in adata.obs.batch.unique():
    adata_copy = adata[adata.obs.batch == ... ]
    sq.gr.spatial_neighbors(adata_copy)
    res = sq.gr.nhood_enrichment(adata_copy, ..., copy=True)
    nhood_enrichment_result.append(res)

and then aggregate the results ex-post as you find most suitable. Do you think that helps? Actually aggregating the results might be non-trivial, if for isntance you don't have all the clusters present in all the slides... Let me know how it goes!

alberto-valdeolivas commented 3 years ago

Hi @giovp ! Many thanks for the quick response.

Yeah, I was thinking to go in this direction of integrating the enrichment results per slide. I will let you know how the integration goes!

Best, Alberto.

giovp commented 3 years ago

hi @alberto-valdeolivas , did you manage with the iterative subsetting?

alberto-valdeolivas commented 3 years ago

Hi @giovp,

To be honest, I did not try it. I am now focused in other projects that do not requite this kind of sample integration. I will come back to it at some point and I will let you know. Sorry about that!

Best, Alberto.

marcovarrone commented 3 years ago

Hi @alberto-valdeolivas, the way I usually manage this situation is by first computing sq.gr.spatial_neighbors() for each of the datasets separately, then concatenating them using anndata.concat and finally assigning the spatial_connectivities as a (sparse) block diagonal matrix of the original spatial connectivities. In this way, all the edges between different samples will be zero (without occupying memory since it's sparse).

A possible code is the following:

import anndata as ad
import squidpy as sq
import scipy.sparse import block_diag

adata_list = ... # Your list of AnnData datasets

for a in adata_list:
    sq.gr.spatial_neighbors(a)

# The uns_merge parameter is to preserve the spatial_neighbors property
adata = ad.concat(adata_list, uns_merge='same')

adata.obsp['spatial_connectivities'] = block_diag([a.obsp['spatial_connectivities'] for a in adata_list]).tocsr()
adata.obsp['spatial_distances'] = block_diag([a.obsp['spatial_distances'] for a in adata_list]).tocsr()

@giovp Would it make sense to implement a method for that or it's too early since there is still no multiple images integration?

alberto-valdeolivas commented 3 years ago

Hello @marcovarrone ! Thanks for sharing! It looks like a great way to integrate it. I will give it a try

giovp commented 3 years ago

@marcovarrone that is a great suggestion! It would definitely be useful to compute such score across layers, returning a single value for the full dataset. I think it might be an interesting option for many squidpy.gr functions. However, I'm not sure this is something that folks might be always interested in doing (as in, having a list of results might be more desirable). It feels to me that we should treat multiple slides as "samples" (e.g. individuals) rather than batches (for which your proposal would work very nicely).
On the other side, these different behaviours could simply be achieved by either having a single Anndata or list of them... I must say I'd have to think about it a bit more, and very curious to hear your thoughts!

marcovarrone commented 3 years ago

Very good point @giovp. I have to say that from the type of analyses that I am doing I tend to group multiple individuals in the same anndata object. I then save information on the sample/condition/etc... as obs of each cell or spot. This allows me to use directly anndata.X and the labels to perform analyses like batch correction and clustering in all the samples at the same time.

As you said, this could be done by concatenating the data from a list of Anndata and then distributing the result to each of the Anndata. I rarely work in this way because it makes me more prone to errors, but I guess is only a matter of taste, so it makes sense to leave it as it is and keep squidpy as "taste-agnostic" as possible.

So, I think there are 3 different ways to approach it (if we want to do something about it):

ivirshup commented 3 years ago

@marcovarrone, just a minor point, but you should be able to get the same result to using block_diag by passing ad.concat(..., pairwise=True)

giovp commented 3 years ago

thanks @ivirshup for the H/T . @marcovarrone documenting the pairwise=True arg in anndata could be a solution? That way all the functins would work out of the box and we would not require a change in the API.

marcovarrone commented 3 years ago

Woah I didn't know about the pairwise parameter, thank you @ivirshup! @giovp Sounds good to me

giovp commented 2 years ago

related to #332