sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
177 stars 68 forks source link

Load REACH statements in ReachProcessor #1322

Closed sanyabt closed 3 years ago

sanyabt commented 3 years ago

Hi @bgyori, hope you're doing well. I am trying to load REACH statements from JSON file after they have been saved by the reach.process_text function (as below). However, even though the JSON file shows statements when opened in a text editor, the code below returns an empty list when trying to print the REACH statements. Am I doing something incorrect? Any help would be appreciated, thanks!

Code:

from indra.sources import reach
rp1 = reach.process_text(text, citation='11205489', output_fname='pmid_reach.json')
with open('pmid_reach.json', 'r') as ff:
    reach_dict = json.load(ff)
rp2 = reach.processor.ReachProcessor(reach_dict, pmid='11205489')
print(rp2.statements)

I know I can use rp1.statements but I want to use the JSON files saved in a separate script so I am trying to load the statements again through this process.

bgyori commented 3 years ago

Hi @sanyabt, I suspect the issue here is that there is a distinction between (1) Reach's own reader output which is serialized as JSON and (2) INDRA Statements processed from Reach output and then serialized as JSON. These two are different, so make sure you use one or the other consistently. Otherwise both approaches can be used to persist reader output or statements and use them later. For (1), you might do something like:

from indra.sources import reach
rp1 = reach.process_text(text, citation='11205489', output_fname='pmid_reach.json')
[...]
rp2 = reach.process_json_file('pmid_reach.json')

Here, rp1.statements and rp2.statements should have the same content (though they are not the same objects due to serialization/deserialization).

For (2), you might do something like

from indra.sources import reach
rp1 = reach.process_text(text, citation='11205489', output_fname='pmid_reach.json')
from indra.statements import stmts_to_json_file
stmts_to_json_file(rp1.statements, 'indra_stmts_from_pmid_reach.json')
[...]
from indra.statements import stmts_from_json_file
stmts = stmts_from_json_file('indra_stmts_from_pmid_reach.json')

Here the content of stmts should be the same as the content of rp1.statements.

sanyabt commented 3 years ago

Ah okay, I got it. Thank you so much!