sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
173 stars 65 forks source link

Implement input API and processor for EVEX database #1393

Open bgyori opened 1 year ago

bgyori commented 1 year ago

This PR implements an API and processor for the EVEX text mining database (http://evexdb.org/). The approach is to use the "network" relations file as the backbone of INDRA Statement extraction. However, to gather evidence text, raw agent text and coordinates, and other metadata, it is necessary to find support for each relation in raw standoff output files. This latter part is considerably complicated since network relations aren't explicitly linked to elements of the standoff output. The processor produces around 620k Statements (each with a single Evidence), and more than 99% of these contain evidence text meaning these were successfully identifier from standoff files.