sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
173 stars 65 forks source link

Problem with get_biopax_stmts() with neighborhood option #1389

Closed devonjkohler closed 2 years ago

devonjkohler commented 2 years ago

Hi,

I am running into an error when trying to pull interactions using get_biopax_stmts(query="neighborhood"). This only happens when running the function using the neighborhood query and does not appear when using the default pathsbetween query. Here is a small script to recreate the error:

from indra.tools.gene_network import GeneNetwork

gn = GeneNetwork(["BRAF", "NRAS", "KRAS", "HRAS", "EGFR"])
biopax_stmts = gn.get_biopax_stmts(query="neighborhood")

The full error is as follows:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [488], in <cell line: 1>()
----> 1 biopax_stmts = gn.get_biopax_stmts(query="neighborhood")

File ~/.local/lib/python3.8/site-packages/indra/tools/gene_network.py:133, in GeneNetwork.get_biopax_stmts(self, filter, query, database_filter)
    129     bp = biopax.process_pc_pathsbetween(self.gene_list,
    130                                     database_filter=database_filter,
    131                                     block_size=block_size)
    132 elif query == 'neighborhood':
--> 133     bp = biopax.process_pc_neighborhood(self.gene_list,
    134                                     database_filter=database_filter)
    135 else:
    136     logger.error('Invalid query type: %s' % query)

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/api.py:51, in process_pc_neighborhood(gene_names, neighbor_limit, database_filter)
     47 model = model_from_pc_query('neighborhood', source=gene_names,
     48                             limit=neighbor_limit,
     49                             datasource=database_filter)
     50 if model is not None:
---> 51     return process_model(model)

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/api.py:210, in process_model(model)
    197 """Returns a BiopaxProcessor for a BioPAX model object.
    198 
    199 Parameters
   (...)
    207     A BiopaxProcessor containing the obtained BioPAX model in bp.model.
    208 """
    209 bp = BiopaxProcessor(model)
--> 210 bp.process_all()
    211 return bp

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/processor.py:61, in BiopaxProcessor.process_all(self)
     59 def process_all(self):
     60     self._extract_features()
---> 61     self.get_modifications()
     62     self.get_regulate_activities()
     63     self.get_activity_modification()

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/processor.py:262, in BiopaxProcessor.get_modifications(self)
    260 def get_modifications(self):
    261     """Extract INDRA Modification Statements from the BioPAX model."""
--> 262     for enz, sub, gained_mods, lost_mods, \
    263             activity_change, ev in self._conversion_state_iter():
    264         for mods, is_gain in ((gained_mods, True), (lost_mods, False)):
    265             for mod in mods:

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/processor.py:241, in BiopaxProcessor._conversion_state_iter(self)
    238 def _conversion_state_iter(self):
    239     """An iterator over state changed in controlled conversions
    240     in the model."""
--> 241     for primary_controller_agent, ev, control, conversion in \
    242             self._control_conversion_iter(bp.Conversion, 'primary'):
    243         for inp, outp in self.find_matching_left_right(conversion):
    244             # There is sometimes activity change at the family level
    245             # which we need to capture
    246             _, _, overall_activity_change = self.feature_delta(inp, outp)

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/processor.py:218, in BiopaxProcessor._control_conversion_iter(self, conversion_type, controller_logic)
    215 primary_controller_agents = []
    216 for pc in _listify(primary_controller):
    217     primary_controller_agents += \
--> 218         _listify(self._get_agents_from_entity(pc))
    219 for primary_controller_agent in primary_controller_agents:
    220     yield primary_controller_agent, ev, control, conversion

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/processor.py:514, in BiopaxProcessor._get_agents_from_entity(self, bpe)
    510     return agents
    512 # If it is a single entity, we get its name and database
    513 # references
--> 514 return self._get_agents_from_singular_entity(bpe)

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/processor.py:493, in BiopaxProcessor._get_agents_from_singular_entity(self, bpe)
    490         agents.append(agent)
    491 # Otherwise it's just a regular Agent
    492 else:
--> 493     agent = get_standard_agent(name, clean_up_xrefs(xrefs), mods=mcs)
    494     agents.append(agent)
    496 # There is a potential here that an Agent name was set to None
    497 # if both the display name and the standard name are missing.
    498 # We filter these out

File ~/.local/lib/python3.8/site-packages/indra/sources/biopax/processor.py:998, in clean_up_xrefs(xrefs)
    996         db_refs[k] = v
    997     if k == 'UP':
--> 998         db_refs.update(_refs_from_up_id(db_refs[k]))
    999 return db_refs

KeyError: 'UP'
bgyori commented 2 years ago

Thanks @devonjkohler, I am looking into it and will push a fix once I figure out the problem.

bgyori commented 2 years ago

Hi @devonjkohler, I found the issue: it had to do with a corner case in the BioPAX content that the processor didn't handle. You can get the latest version of INDRA from Github with the fixes as

pip install git+https://github.com/sorgerlab/indra.git

Also, you are always welcome to reach out to me directly if I can help with your higher level goals for which you were running this query - there might be better ways to get the same content.