scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.83k stars 588 forks source link

Multiomics partitions #1107

Closed dawe closed 4 years ago

dawe commented 4 years ago

As more and more technologies allow multimodal characterization of single cells it could be useful to exploit some functionalities of scanpy's toolkit to perform, at least, some rough integrative analysis. Assuming we have to modalities on different layers (say RNA and ATAC), one could create two knn graphs for both layers and use leidenalg.find_partition_multiplex to perform a joint call of partitions handling the two (or more) graphs as a multiplex. I have tested myself this approach, described in leidenalg documentation, it works and it is highly configurable. We can take care of the implementation of enhancement (as leiden_multiplex() function?), I just want to be sure that it is not already on the development roadmap and that it is ok to have it into scanpy and not as an external tool.

dawe commented 4 years ago

I realize that I'm working on a specific dataset that fits into layer specification (i.e. same obs, same var), while in general this is not true (e.g. scRNA + scATAC). Still, multiplex could be analyzed from 2+ AnnData objects when adata.uns['neighbors'] is present

giovp commented 4 years ago

Hi @dawe ! Just to chip in real quick, I think your suggestion makes a lot of sense and beside the examples you already mention I believe spatial transcriptomics data could also benefit a lot from such approach (since with such data you have both a knn graph from gene expression as well as a graph from spatial coordinates).

dawe commented 4 years ago

I never used spatial data (so far), are they organized as separate AnnData objects? If everything that could be integrated is a single AnnData then the function would be easy, like

def leiden_multiplex(adata: Sequence[AnnData], use_computed: bool = False, weights: None):

    adj_list = [x.uns['neighbors']['connectivities'] for x in adata]
    G_list = [sc._utils.get_igraph_from_adjacency(x) for x in adj_list] #also add the `restrict_to` step

    if use_computed:
        part_list = [get_partitions_from_adata.obs] or [recalculate_partitions_with_neighbors_params]
        # then run the optimizer
    else:
        membership, improv = la.find_partitions_multiplex(**params)

    for a in adata:
        a.obs['multiplex'] = pd.Categorical(membership)

where adata is a list of AnnData objects, use_computed switches between recalculate partitions (False) or optimize partitions already calculated (True). Weights can be specified to give more or less importance to a specific view. Note that, by default, if set to None it is set to a list of ones by leidenalg. Other options, in addition to the usual copy = False should be the leidenalg type of partitioning (CPMVertexPartition, RBConfigurationVertexPartition...)

giovp commented 4 years ago

Spatial data would also be a single anndata object, similarly to CITE-seq but with images in addition to expression counts. Anyway, I think this functionality would be very useful and really cool to try out. Worth mentioning that uns['neighbors'] will be moved to obsp (see theislab/scanpy-tutorials#14) and that probably @ivirshup is working along similar lines (see #1117 ).

dawe commented 4 years ago

I swear I haven’t read the commits on this by @ivirshup made only a bunch of hours before! Great, I’m closing this issue, then