pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Request by log index causes key error #361

Closed 23722 closed 1 year ago

23722 commented 1 year ago

pm4py.__version__ == 2.3.2

Imported an XES file (Sepsis Log) with log = pm4py.read_xes(path_log) Goal: Query the log by index, e.g. log[0] .

Expected behavior Like in pm4py.__version__ == 2.2.X (I tested it with two versions of 2.2.X, works fine):

{'attributes': {'concept:name': 'A'}, 'events': [{'InfectionSuspected': True, 'org:group': 'A', 'DiagnosticBlood': True, 'DisfuncOrg': True, 'SIRSCritTachypnea': True, 'Hypotensie': True, 'SIRSCritHeartRate': True, 'Infusion': True, 'DiagnosticArtAstrup': True, 'concept:name': 'ER Registration', 'Age': 85, 'DiagnosticIC': True, 'DiagnosticSputum': False, 'DiagnosticLiquor': False, 'DiagnosticOther': False, 'SIRSCriteria2OrMore': True, 'DiagnosticXthorax': True, 'SIRSCritTemperature': True, 'time:timestamp': datetime.datetime(2014, 10, 22, 11, 15, 41, tzinfo=datetime.timezone(datetime.timedelta(0, 7200))), 'DiagnosticUrinaryCulture': True, 'SIRSCritLeucos': False, 'Oligurie': False, 'DiagnosticLacticAcid': True, 'lifecycle:transition': 'complete', 'Diagnose': 'A', 'Hypoxie': False, 'DiagnosticUrinarySediment': True, 'DiagnosticECG': True}, '..', {'org:group': 'E', 'lifecycle:transition': 'complete', 'concept:name': 'Release A', 'time:timestamp': datetime.datetime(2014, 11, 2, 15, 15, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))}]}

Actual Behavior (with pm4py.__version__ == 2.3.2)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~\Miniconda3\envs\pm4py_2.3\lib\site-packages\pandas\core\indexes\base.py:3803, in Index.get_loc(self, key, method, tolerance)
   3802 try:
-> 3803     return self._engine.get_loc(casted_key)
   3804 except KeyError as err:

File ~\Miniconda3\envs\pm4py_2.3\lib\site-packages\pandas\_libs\index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File ~\Miniconda3\envs\pm4py_2.3\lib\site-packages\pandas\_libs\index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas\_libs\hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\_libs\hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In [5], line 5
      1 # Print whole log
      2 #print(log)
      3 
      4 # Print trace by index
----> 5 print(log[0])

File ~\Miniconda3\envs\pm4py_2.3\lib\site-packages\pandas\core\frame.py:3805, in DataFrame.__getitem__(self, key)
   3803 if self.columns.nlevels > 1:
   3804     return self._getitem_multilevel(key)
-> 3805 indexer = self.columns.get_loc(key)
   3806 if is_integer(indexer):
   3807     indexer = [indexer]

File ~\Miniconda3\envs\pm4py_2.3\lib\site-packages\pandas\core\indexes\base.py:3805, in Index.get_loc(self, key, method, tolerance)
   3803     return self._engine.get_loc(casted_key)
   3804 except KeyError as err:
-> 3805     raise KeyError(key) from err
   3806 except TypeError:
   3807     # If we have a listlike key, _check_indexing_error will raise
   3808     #  InvalidIndexError. Otherwise we fall through and re-raise
   3809     #  the TypeError.
   3810     self._check_indexing_error(key)

KeyError: 0
fit-alessandro-berti commented 1 year ago

Dear @23722

With the release of pm4py 2.3.0 we changed our default data structure to Pandas dataframe.

Which means, we lose the possibility to iterate over the cases of the log, in favour of a columnar-style storage (which allows for easier querying/filtering).

To use the old code with the new pm4py.read_xes output, you would need to manually call the conversion to the traditional EventLog object: log = pm4py.convert_to_event_log(log)