10X VISIUM integrated slides and spatial neighbors graph

alberto-valdeolivas commented 3 years ago

Hello. First of all, I would like to thank and congratulate you for this amazing package and its related manuscript.

I am working with several different 10X VISIUM slides from the same tissue and same condition. I am integrating the gene expression from all the slides using bbknn (as described in the Scanpy tutorial) in order to compute common clusters. I am now planning to run the neighborhood enrichment analyses. To do so, I need to first create a graph from the spatial coordinates using squidpy.gr.spatial_neighbors. I attached the slide name to every spot ID to avoid having duplicates. However, I am wondering if this function somehow deals with the coordinates, since in principle those are the same for every slide, right?

Thank you and best regards, Alberto

giovp commented 3 years ago

Hi @alberto-valdeolivas ,

thanks for the interest in Squidpy and the kind words! Indeed, as you mention, sq.gr.spatial_neighbors is not batch aware and should not be run in the joint adata object. The only way I could come up for something like you describe, is simply filtering the anndata by batch and compute spatial graph and nhood enrichment separately. e.g.

nhood_enrich_results = []
for batch in adata.obs.batch.unique():
    adata_copy = adata[adata.obs.batch == ... ]
    sq.gr.spatial_neighbors(adata_copy)
    res = sq.gr.nhood_enrichment(adata_copy, ..., copy=True)
    nhood_enrichment_result.append(res)

and then aggregate the results ex-post as you find most suitable. Do you think that helps? Actually aggregating the results might be non-trivial, if for isntance you don't have all the clusters present in all the slides... Let me know how it goes!

alberto-valdeolivas commented 3 years ago

Hi @giovp ! Many thanks for the quick response.

Yeah, I was thinking to go in this direction of integrating the enrichment results per slide. I will let you know how the integration goes!

Best, Alberto.

giovp commented 3 years ago

hi @alberto-valdeolivas , did you manage with the iterative subsetting?

alberto-valdeolivas commented 3 years ago

Hi @giovp,

To be honest, I did not try it. I am now focused in other projects that do not requite this kind of sample integration. I will come back to it at some point and I will let you know. Sorry about that!

Best, Alberto.

marcovarrone commented 3 years ago

Hi @alberto-valdeolivas, the way I usually manage this situation is by first computing sq.gr.spatial_neighbors() for each of the datasets separately, then concatenating them using anndata.concat and finally assigning the spatial_connectivities as a (sparse) block diagonal matrix of the original spatial connectivities. In this way, all the edges between different samples will be zero (without occupying memory since it's sparse).

A possible code is the following:

import anndata as ad
import squidpy as sq
import scipy.sparse import block_diag

adata_list = ... # Your list of AnnData datasets

for a in adata_list:
    sq.gr.spatial_neighbors(a)

# The uns_merge parameter is to preserve the spatial_neighbors property
adata = ad.concat(adata_list, uns_merge='same')

adata.obsp['spatial_connectivities'] = block_diag([a.obsp['spatial_connectivities'] for a in adata_list]).tocsr()
adata.obsp['spatial_distances'] = block_diag([a.obsp['spatial_distances'] for a in adata_list]).tocsr()

@giovp Would it make sense to implement a method for that or it's too early since there is still no multiple images integration?

alberto-valdeolivas commented 3 years ago

Hello @marcovarrone ! Thanks for sharing! It looks like a great way to integrate it. I will give it a try

giovp commented 3 years ago

@marcovarrone that is a great suggestion! It would definitely be useful to compute such score across layers, returning a single value for the full dataset. I think it might be an interesting option for many squidpy.gr functions. However, I'm not sure this is something that folks might be always interested in doing (as in, having a list of results might be more desirable). It feels to me that we should treat multiple slides as "samples" (e.g. individuals) rather than batches (for which your proposal would work very nicely).
On the other side, these different behaviours could simply be achieved by either having a single Anndata or list of them... I must say I'd have to think about it a bit more, and very curious to hear your thoughts!

marcovarrone commented 3 years ago

Very good point @giovp. I have to say that from the type of analyses that I am doing I tend to group multiple individuals in the same anndata object. I then save information on the sample/condition/etc... as obs of each cell or spot. This allows me to use directly anndata.X and the labels to perform analyses like batch correction and clustering in all the samples at the same time.

As you said, this could be done by concatenating the data from a list of Anndata and then distributing the result to each of the Anndata. I rarely work in this way because it makes me more prone to errors, but I guess is only a matter of taste, so it makes sense to leave it as it is and keep squidpy as "taste-agnostic" as possible.

So, I think there are 3 different ways to approach it (if we want to do something about it):

Not developing any method and place the example only in the documentation
Extending squidpy.gr.spatial_neighbors by adding an optional parameter that specifies the label (i.e. one column of the obs property) by which to compute the graph separately and then placing the resulting graphs in a diagonal block matrix. For example if I have two samples I would create the obs "individual" and assign a certain value to the cells or spots of the first individual and another value to the ones of the second individual. By passing "individual" as value of the new parameter the method would compute two separate graphs and then integrating them as I showed before.
Create a squidpy-specific method for concatenating AnnData objects (extending anndata.concat) that automatically performs the integration of the spatial_connectivities under the hood. This is a pro but could also be a con if it is a behavior that the user may not expect. And it would require a new category of functions since (I guess) is not suitable for any of squidpy.gr, squidpy.im and squidpy.pl.

ivirshup commented 3 years ago

@marcovarrone, just a minor point, but you should be able to get the same result to using block_diag by passing ad.concat(..., pairwise=True)

giovp commented 3 years ago

thanks @ivirshup for the H/T . @marcovarrone documenting the pairwise=True arg in anndata could be a solution? That way all the functins would work out of the box and we would not require a change in the API.

marcovarrone commented 3 years ago

Woah I didn't know about the pairwise parameter, thank you @ivirshup! @giovp Sounds good to me

giovp commented 2 years ago

related to #332

scverse / squidpy

10X VISIUM integrated slides and spatial neighbors graph #318