sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
173 stars 65 forks source link

Representing fusion proteins #1394

Open cthoyt opened 1 year ago

cthoyt commented 1 year ago

Every once and a while, I run into something like

LPL–GPIHBP1 fusion protein showed high enzymatic activity in in vitro assays using surrogate substrates as well as the natural LPL substrates VLDL and CM.

that gets picked up as a binding (i.e., a complex statement), but being a fusion protein is a totally different kind of phenomena. There are a few things that could help reduce curation burden on these:

  1. Add some simple rules for filtering these out from complexes, since fusion protein usually is pretty explicit
  2. Add new rules to reach?
  3. Consider creating a new statement type to represent this explicitly