pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Method get_event_attribute_values does not get attributes. #478

Closed Catadanna closed 6 months ago

Catadanna commented 7 months ago

Hallo,

I have a problem filtering according to attributes. I load my data in a dataframe from a csv file. I have attributes such as 'participant', 'complexity', etc.

Here is my code :


import pandas as pd
import pm4py

df = pd.read_csv(file_path)
event_log = pm4py.format_dataframe(df, case_id='variant_instance_id', activity_key='name', timestamp_key='end_timestamp')
event_log_final = pm4py.convert_to_event_log(event_log)
resources = pm4py.get_event_attribute_values(event_log_final, "org:resource")
print(resources)

The print here is an empty dict.

On the other hand, if I do this, I get the activities, it is OK :

activities = pm4py.get_event_attribute_values(event_log, "concept:name")

Here is the error I get:


Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "index.pyx", line 153, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 182, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'org:resource'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  (XXX)
    resources = pm4py.get_event_attribute_values(event_log, "org:resource")
  File "/home/adminid/.local/lib/python3.10/site-packages/pm4py/stats.py", line 165, in get_event_attribute_values
    return get.get_attribute_values(log, attribute, parameters=parameters)
  File "/home/adminid/.local/lib/python3.10/site-packages/pm4py/statistics/attributes/pandas/get.py", line 158, in get_attribute_values
    attributes_values_dict = df[attribute_key].value_counts().to_dict()
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 4090, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 3809, in get_loc
    raise KeyError(key) from err
KeyError: 'org:resource'

The problem concerns accessing the attributes. Please investigate or tell me where is the problem.

Thank you in advance.

fit-alessandro-berti commented 7 months ago

Dear @Catadanna ,

Can you print the columns of the dataframe?

print(event_log.columns)

I think you do not have 'org;resource' among the attributes of your event log.

Catadanna commented 7 months ago

I do not have org:ressource among the columns. I have time:timestamp and concept:name. Do I have to declare org:resource ? Here are my columns :

Index(['name', 'taskId', 'complexity', 'retries', 'start_timestamp', 'end_timestamp',
        'variant_instance_id', 'task_rank', 'variant_name',
       'status', 'case:concept:name', 'concept:name', 'time:timestamp',
       '@@index', '@@case_index'],
      dtype='object')

I use Python 3.10 and pm4py version 2.7.10.1.

fit-alessandro-berti commented 7 months ago

You can apply the get_event_attribute_values only for the attributes (columns of the CSV) that are in the file.

Catadanna commented 7 months ago

That is what I did. I sent you the column names. What shall I do ? The library adds time:timestamp as column names why don't I have org:resource ?

I apply this :

print("Ressources", resources)

And the result is :

Ressources {}