openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
287 stars 76 forks source link

Task: Benchmark Cell-Cell Communication inference from scRNA-Seq #320

Closed dbdimitrov closed 1 year ago

dbdimitrov commented 3 years ago

Hello Everyone!

I'm Daniel, a PhD student at saezlab. I'm opening this GitHub issue motivated by our recent comparison of cell-cell communication methods[1] as well as discussions within the lab and beyond.

Cell-Cell Communication inference from scRNA-Seq

The growing availability of single-cell RNA sequencing (scRNA-Seq) data is helping us improve our understanding of the cellular heterogeneity of tissues. Furthermore, Spatial Transcriptomics has recently emerged as a technology to measure gene expression while preserving the spatial distribution of cells in a sample, thus providing an unprecedented opportunity to decipher tissue architecture and organization [1]. These advancements have in turn led to an increased interest in the development of tools for cell-cell communication (CCC) inference. CCC commonly refers to interactions between secreted ligands and plasma membrane receptors. This picture can be broadened to include secreted enzymes, extracellular matrix proteins, transporters, and interactions that require the physical contact between cells, such as cell-cell adhesion proteins and gap junctions [2]. For simplicity, we refer to all of these protein-protein binding events as CCC. CCC events are essential for homeostasis, development, and disease, and their estimation is becoming a routine approach in scRNA-seq data analysis [3].

A number of computational tools and resources have emerged that can be further classified as those that predict CCC interactions alone [4–13], and those that additionally estimate intracellular pathway activities related to CCC [14–18]. In this proposal, we focus on the former (Table 1). These CCC tools typically use gene expression information obtained by scRNA-Seq. CCC tools can predict intercellular crosstalk between any pair of clusters, one cluster being the source and the other the target of a CCC event. CCC events are thus typically represented as a one-to-one interaction between a ‘transmitter’ and ‘receiver’ protein, expressed by the source and target cell clusters, respectively. The information about which transmitter binds to which receiver is extracted from diverse sources of prior knowledge. Roughly, CCC tools then estimate the likelihood of crosstalk based on the expression level of the transmitter and the receiver in the source and target clusters, respectively. Every tool has two major components: a resource of prior knowledge on CCC (interactions), and a method to estimate CCC from the known interactions and the dataset at hand. Despite the aforementioned common premises to explore CCC events, each tool uses a different method, such as permutation of cluster labels, regularizations, and scaling, to prioritize interactions according to the input datasets (Table 1). Recently we performed a comprehensive analysis using 6 methods and 15 resources and observed notably different results depending on the choice of method and resource [19]. However, the different methods use diverse scoring systems that are difficult to compare and evaluate. These difficulties are further exacerbated by the lack of an appropriate gold standard or a synthetic dataset that appropriately captures the complexity of intercellular interactions [3,20].

Here we propose two benchmark directions that leverage spatial information to assess the performance of the methods (and resources). Both of these benchmarks assume that spatial distance is informative of a method’s performance. We thus acknowledge that our proposals provide only indirect measures of performance, and we welcome any further suggestions.

We also described the proposals presented here in Supp. Note 4.

Benchmark Directions

I) Associations between CCC activity and Spatial-Adjacency

Assumptions:

Cell clusters that are spatially adjacent should be communicating more actively than those that are spatially distant; Confining CCC inference to spatial adjacency should reduce false positives.

Limitations:

Difficult to distinguish cell-cell communication and cellular program coregulation.

Examples: The correlation between spatial distance and cell-cell communication activity was already used as a way to validate some methods [13,18], while other methods explicitly take spatial information into account for CCC inference [20]. Another example is confining CCC inference to cells that are expected to be in close contact, e.g. according to co-localizing cells in visium spots to reduce false positive interactions [21]. In a similar way, 10x Visium data can be used to identify cell types that are known to be co-located in visium spots, and are hence in close contact.

Metrics:

A benchmark focused on the relationship(1) between CCC Activity(2) and Spatial Distance(3):

(1) Correlation or Regression Coefficients, or any other metric that can be used as a proxy of the relationship between spatial distance and CCC activity

(2) CCC activity as reported by different CCC methods. For example, the number of inferred interactions (i.e. number of in- and/or out-going edges) between cell cluster pairs

(3) Spatial distance, measured by the Euclidean distances between cell clusters [22], or the neighbourhood enrichment or spatial co-occurrence of cell clusters [23,24]. An alternative approach would be to discretise distance [13,18], e.g. according to spatially-adjacent and spatially-distant cell types.

II) Data-driven Inference of Spatial Covariance to explain Transmitter-Receiver interactions

Assumptions:

Receiver and transmitter gene expression covariance with spatial distance is a proxy of CCC events.

Limitations:

Difficult to distinguish cell-cell communication and cellular program coregulation; Possibly biased towards CCC events in which the transmitter and receiver regulate each other’s expression.

Approach:

1) The expression of a receiver is spatially explainable by the expression of a transmitter and vice versa. Thus, a threshold signifying conserved spatial gene regulation between transmitters and receivers (with e.g. mistyR[25]) can be used to define putative true positive interactions.

2) Downstream signalling models to explain transmitter and receiver activity. Some methods already utilize downstream signalling as an attempt to better model CCC interactions[15,16,18]. In a similar way prior knowledge of downstream transmitter and receiver activity models[15], can be used to spatially explain the activity of transmitter and receiver. Alternatively, one can build naïve protein-protein interaction models from existing databases[2].

Metrics:

Methods’ (and Resources’) coverage of the spatially explainable CCC events. In other words, we expect a method to assign preferentially high ranks to spatially explainable transmitter-receiver interactions.

AUROC can be calculated according to different thresholds of spatial covariance for transmitter/receiver genes involved in CCC interactions. Thus, a reliable method and resource should be able to pick up spatial covariance better than a random, and better than a resource composed of genes that are not explainable by space (e.g. housekeeping genes).

Methods

These methods are currently implemented in our framework, but we welcome any further method proposals!

Datasets

Publicly available 10x visium datasets or any other spatial transcriptomics datasets.

Further Information

Current Efforts

Different strategies have been used to indirectly evaluate the methods’ performance, including a presumed correlation between CCC activity and spatial adjacency [13,18], recovering the effect of receptor gene knockouts [18], robustness to subsampling [13], agreement with proteomics [11], simulated scRNA-Seq data [8], and the agreement among methods [9,11,13,18].

General Method Assumptions:

General Method Limitations:

Bibliography

1 Chen X, Teichmann SA & Meyer KB (2018) From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture. Annu Rev Biomed Data Sci 1, 29–51. 2 Türei D, Valdeolivas A, Gul L, Palacio-Escat N, Klein M, Ivanova O, Ölbei M, Gábor A, Theis F, Módos D, Korcsmáros T & Saez-Rodriguez J (2021) Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Mol Syst Biol 17, e9923. 3 Armingol E, Officer A, Harismendy O & Lewis NE (2021) Deciphering cell-cell interactions and communication from gene expression. Nat Rev Genet 22, 71–88. 4 Cillo AR, Kürten CHL, Tabib T, Qi Z, Onkar S, Wang T, Liu A, Duvvuri U, Kim S, Soose RJ, Oesterreich S, Chen W, Lafyatis R, Bruno TC, Ferris RL & Vignali DAA (2020) Immune Landscape of Viral- and Carcinogen-Driven Head and Neck Cancer. Immunity 52, 183-199.e9. 5 Wang Y, Wang R, Zhang S, Song S, Jiang C, Han G, Wang M, Ajani J, Futreal A & Wang L (2019) iTALK: an R Package to Characterize and Illustrate Intercellular Communication. BioRxiv. 6 Tyler SR, Rotti PG, Sun X, Yi Y, Xie W, Winter MC, Flamme-Wiese MJ, Tucker BA, Mullins RF, Norris AW & Engelhardt JF (2019) PyMINEr Finds Gene and Autocrine-Paracrine Networks from Human Islet scRNA-Seq. Cell Rep 26, 1951-1964.e8. 7 Efremova M, Vento-Tormo M, Teichmann SA & Vento-Tormo R (2020) CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat Protoc 15, 1484–1506. 8 Tsuyuzaki K, Ishii M & Nikaido I (2019) Uncovering hypergraphs of cell-cell interaction from single cell RNA-sequencing data. BioRxiv. 9 Raredon MSB, Yang J, Garritano J, Wang M, Kushnir D, Schupp JC, Adams TS, Greaney AM, Leiby KL, Kaminski N, Kluger Y, Levchenko A & Niklason LE (2021) Connectome : computation and visualization of cell-cell signaling topologies in single-cell systems data. BioRxiv. 10 Hou R, Denisenko E, Ong HT, Ramilowski JA & Forrest ARR (2020) Predicting cell-to-cell communication networks using NATMI. Nat Commun 11, 5011. 11 Cabello-Aguilar S, Alame M, Kon-Sun-Tack F, Fau C, Lacroix M & Colinge J (2020) SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Res 48, e55. 12 Noël F, Massenet-Regad L, Carmi-Levy I, Cappuccio A, Grandclaudon M, Trichot C, Kieffer Y, Mechta-Grigoriou F & Soumelis V (2020) ICELLNET: a transcriptome-based framework to dissect intercellular communication. BioRxiv. 13 Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan C-H, Myung P, Plikus MV & Nie Q (2021) Inference and analysis of cell-cell communication using CellChat. Nat Commun 12, 1088. 14 Choi H, Sheng J, Gao D, Li F, Durrans A, Ryu S, Lee SB, Narula N, Rafii S, Elemento O, Altorki NK, Wong STC & Mittal V (2015) Transcriptome analysis of individual stromal cell populations identifies stroma-tumor crosstalk in mouse lung cancer model. Cell Rep 10, 1187–1201. 15 Browaeys R, Saelens W & Saeys Y (2020) NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 17, 159–162. 16 Wang S, Karikomi M, MacLean AL & Nie Q (2019) Cell lineage and communication network inference via optimization for single-cell transcriptomics. Nucleic Acids Res 47, e66. 17 Cheng J, Zhang J, Wu Z & Sun X (2021) Inferring microenvironmental regulation of gene expression from single-cell RNA sequencing data using scMLnet with an application to COVID-19. Brief Bioinformatics 22, 988–1005. 18 Hu Y, Peng T, Gao L & Tan K (2021) CytoTalk: De novo construction of signal transduction networks using single-cell transcriptomic data. Sci Adv 7. 19 Dimitrov D, Türei D, Boys C, Nagai JS, Ramirez Flores RO, Kim H, Szalai B, Costa IG, Dugourd A, Valdeolivas A & Saez-Rodriguez J (2021) Comparison of Resources and Methods to infer Cell-Cell Communication from Single-cell RNA Data. BioRxiv. 20 Almet AA, Cang Z, Jin S & Nie Q (2021) The landscape of cell-cell communication through single-cell transcriptomics. Current Opinion in Systems Biology 26, 12–23. 21 Garcia-Alonso L, Handfield L-F, Roberts K, Nikolakopoulou K, Fernando RC, Gardner L, Woodhams B, Arutyunyan A, Polanski K, Hoo R, Sancho-Serra C, Li T, Kwakwa K, Tuck E, Kleshchevnikov V, Tarkowska A, Porter T, Mazzeo CI, van Dongen S, Dabrowska M & Vento-Tormo R (2021) Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro. BioRxiv. 22 Armingol E, Joshi CJ, Baghdassarian H, Shamie I, Ghaddar A, Chan J, Her H-L, O’Rourke EJ & Lewis NE (2020) Inferring the spatial code of cell-cell interactions and communication across a whole animal body. BioRxiv. 23 Dries R, Zhu Q, Dong R, Eng C-HL, Li H, Liu K, Fu Y, Zhao T, Sarkar A, Bao F, George RE, Pierson N, Cai L & Yuan G-C (2021) Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol 22, 78. 24 Palla G, Spitzer H, Klein M, Fischer DS, Schaar AC, Kuemmerle LB, Rybakov S, Ibarra IL, Holmberg O, Virshup I, Lotfollahi M, Richter S & Theis FJ (2021) Squidpy: a scalable framework for spatial single cell analysis. BioRxiv. 25 Tanevski J, Ramirez Flores RO, Gabor A, Schapiro D & Saez-Rodriguez J (2020) Explainable multi-view framework for dissecting inter-cellular signaling from highly multiplexed spatial data. BioRxiv.

rcannood commented 3 years ago

Thanks for making this issue!

Have you already started writing code for this task within the current openproblems framework? If not, I'm wondering if you could give opsca-viash a shot, as it allows for much easier switching between Python and R components.

These methods are currently implemented in our framework,

For the sake of completeness, could you post a link to the framework?

but we welcome any further method proposals!

How hard is it to add NicheNet ?

scottgigante-immunai commented 1 year ago

This task has been added :)