sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
178 stars 68 forks source link

Extend INDRA for cellular communication #1398

Open Siliegia opened 2 years ago

Siliegia commented 2 years ago

Hi! This is a great package with many useful resources! After looking into it, we were wondering if it can also be used to model cellular communication. More precisely, we would like to extend INDRA’s capabilities to allow for the information extraction and modeling of cellular communication. This would include in particular these two steps:

  1. Adding ligand-receptor information to proteins so that we know if they act as ligands or receptors.
  2. Adding cell type information to proteins, so that we can know which cell types can potentially interact with each other OR add cell types as agents with lists of associated proteins in order to model the dynamics between cell types. Do you have any advice or tips for this implementation?

I’m very much looking forward to an answer! :)

Best wishes, Maria

bgyori commented 2 years ago

Hi @Siliegia, thanks for this question! I think this is a really interesting topic and probably requires a bit of extra work in several components to do well.

First, a naive solution is to use INDRA and the reading systems it integrates as is, without any significant changes, and collect statements about binding/complex formation. Then, external resources can be used to filter these interactions to ones containing a known ligand and a known receptor, potentially also bringing in expression profiles of different cell types to understand where each ligand and each receptor can be expressed.

More specifically for your points:

  1. Adding ligand-receptor information to proteins so that we know if they act as ligands or receptors.

Again, this can be provided as external information, independent of INDRA, as outlined in the "naive" approach above, and simply used as a filter condition on INDRA Statements (e.g., to get ones representing an interaction between a ligand and a receptor). Another approach could be to try to pick up descriptions that make ligand/receptor status explicit in the context of a specific interaction. For example, in the sentence "PD-L1 binds to its receptor, PD-1, found on activated T cells, B cells, and myeloid cells", it is made explicit that PD-1 is a receptor for PD-L1 (i.e., providing more information than just the fact that the two proteins bind). This would require custom extensions to reading systems such as Reach along with extensions to representation to make sure there is a way to represent a more specific types of complex formation involving a ligand and receptor in distinct roles. At the level of INDRA, this would involve adding a new Statement type such as LigandReceptorComplex with a separate ligand and receptor argument as a more specific version of the current Complex Statement type which has a flat list of members. It is worth noting that doing this at the level of INDRA can still be useful, independent of reading systems, to capture e.g., ligand-receptor interactions from structured databases.

  1. Adding cell type information to proteins

Currently, context in most reading systems and in INDRA is captured at the level of Statements. So you could capture something like "A binds B in cell type C". Extraction logic and representation at the level of readers and INDRA would have to be slightly generalized to be able to represent things like "A in cell type C1 binds B in cell type C2". This, again, would require some changes in reading systems. In terms of INDRA's representation, it could be solved in two ways: (a) as an argument to a new Statement type like LigandReceptorComplex which could have arguments like LigandReceptorComplex(ligand, receptor, ligand_cell_type, receptor_cell_type) or (b) added as a new attribute of Agents (the current ones include mods, mutations, location, activity and bound_conditions). I think option (a) is more appealing in this case.

Happy to discuss more about ideas!

Siliegia commented 2 years ago

Thank you very much for the fast reply! This is great advice and I will look into these options. I like the idea of defining a new Statement type, where we could possibly also define the type of communication, e.g. autocrine, paracrine etc. I will keep you updated and maybe we can discuss again at a later time point!