sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
171 stars 65 forks source link

Ubibrowser KeyError #1418

Closed kkaris closed 9 months ago

kkaris commented 11 months ago

When running sources.ubibrowser.api.process_from_web, there is a KeyError raised, likely due to updated headers in the latest data from the Ubibrowser.

_Update: the base URL needs to be updated as well. The new URL is http://ubibrowser.bio-it.cn/ubibrowser_v3/Public/download/literature/_

http://ubibrowser.bio-it.cn/ubibrowser_v3/home/download

To re-create the error:

from indra.sources import ubibrowser
up = ubibrowser.process_from_web()

The error output (the Pandas part of the stack trace is omitted for brevity):

KeyError                                  Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 up = ubibrowser.process_from_web()

File ~/repos/indra/indra/sources/ubibrowser/api.py:23, in process_from_web()
     21 e3_df = pandas.read_csv(E3_URL, sep='\t')
     22 dub_df = pandas.read_csv(DUB_URL, sep='\t')
---> 23 return process_df(e3_df, dub_df)

File ~/repos/indra/indra/sources/ubibrowser/api.py:65, in process_df(e3_df, dub_df)
     49 """Process data frames containing UbiBrowser data.
     50 
     51 Parameters
   (...)
     62     extracted in its statements attribute.
     63 """
     64 up = UbiBrowserProcessor(e3_df, dub_df)
---> 65 up.extract_statements()
     66 return up

File ~/repos/indra/indra/sources/ubibrowser/processor.py:16, in UbiBrowserProcessor.extract_statements(self)
     13 for df, stmt_type in [(self.e3_df, Ubiquitination),
     14                       (self.dub_df, Deubiquitination)]:
     15     for _, row in df.iterrows():
---> 16         stmt = self._process_row(row, stmt_type)
     17         if stmt:
     18             self.statements.append(stmt)

File ~/repos/indra/indra/sources/ubibrowser/processor.py:26, in UbiBrowserProcessor._process_row(row, stmt_type)
     20 @staticmethod
     21 def _process_row(row, stmt_type):
     22     # Note that even in the DUB table the subject of the statement
     23     # is called "E3"
     24     # There are some examples where a complex is implied (e.g., BMI1-RNF2),
     25     # for simplicity we just ignore these
---> 26     if '-' in row['E3AC']:
     27         return None
     28     subj_agent = get_standard_agent(row['E3GENE'], {'UP': row['E3AC']})

[...]

KeyError: 'E3AC'

Inspecting the row variable with debug in IPython reveals the following data structure:

NUMBER                                1
SwissProt ID (E3)            ADO1_ARATH
SwissProt ID (Substrate)    APRR1_ARATH
SwissProt AC (E3)                Q94BT6
SwissProt AC (Substrate)         Q9LKL2
Gene Symbol (E3)                   ADO1
Gene Symbol (Substrate)           APRR1
SOURCE                          MEDLINE
SOURCEID                       22199232
SENTENCE                          E3Net
E3TYPE                            Other
COUNT                                 1
type                              Other
species                      A.thaliana
Name: 0, dtype: object
bgyori commented 11 months ago

It sounds like they renamed their columns and changed E3AC to SwissProt AC (E3) so we would have to update the code accordingly.

bgyori commented 11 months ago

I fixed all the issues on the db-sources-updates branch.

kkaris commented 11 months ago

Resolved on #1423