pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

case_statistics.get_variant_statistics(log) leads to TypeError: sequence item 0: expected str instance, int found #226

Closed fmannhardt closed 3 years ago

fmannhardt commented 3 years ago

I am trying to use case_statistics.get_variant_statistics(log) on the Helpdesk log but I am getting the following error.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-109-66d95bb6e121> in <module>
     44         distributions[log_key] = distribution
     45 
---> 46         variants[log_key] = case_statistics.get_variant_statistics(log)

C:\tools\miniconda3\envs\research_transformer\lib\site-packages\pm4py\statistics\traces\log\case_statistics.py in get_variant_statistics(log, parameters)
     72     max_variants_to_return = exec_utils.get_param_value(Parameters.MAX_VARIANTS_TO_RETURN, parameters, None)
     73     varnt = exec_utils.get_param_value(Parameters.VARIANTS, parameters, variants_get.get_variants(log,
---> 74                                                                                               parameters=parameters))
     75     var_durations = exec_utils.get_param_value(Parameters.VAR_DURATIONS, parameters, None)
     76     if var_durations is None:

C:\tools\miniconda3\envs\research_transformer\lib\site-packages\pm4py\statistics\variants\log\get.py in get_variants(log, parameters)
     79     """
     80 
---> 81     variants_trace_idx = get_variants_from_log_trace_idx(log, parameters=parameters)
     82 
     83     all_var = convert_variants_trace_idx_to_trace_obj(log, variants_trace_idx)

C:\tools\miniconda3\envs\research_transformer\lib\site-packages\pm4py\statistics\variants\log\get.py in get_variants_from_log_trace_idx(log, parameters)
    149     variants = {}
    150     for trace_idx, trace in enumerate(log):
--> 151         variant = variants_util.get_variant_from_trace(trace, parameters=parameters)
    152         if variant not in variants:
    153             variants[variant] = []

C:\tools\miniconda3\envs\research_transformer\lib\site-packages\pm4py\util\variants_util.py in get_variant_from_trace(trace, parameters)
     77 
     78     if VARIANT_SPECIFICATION == VariantsSpecifications.STRING:
---> 79         return ",".join([x[activity_key] for x in trace])
     80     elif VARIANT_SPECIFICATION == VariantsSpecifications.LIST:
     81         return tuple([x[activity_key] for x in trace])

TypeError: sequence item 0: expected str instance, int found

Here how to reproduce (using the following dataset: https://mendeley.figshare.com/ndownloader/files/16362776

from pm4py.statistics.traces.log import case_statistics
import pm4py
import pandas as pd

log = pm4py.format.dataframe(pd.read_csv("Helpdesk.csv", sep=','),
                                     case_id = "CaseID",
                                     activity_key= "ActivityID",
                                     timestamp_key= "CompleteTimestamp")
log = pm4py.convert_to_event_log(log)
case_statistics.get_variant_statistics(log)
fmannhardt commented 3 years ago

Seems to be cause by the column not being a str type. So doing providing dtype = str to the CSV read fixes this. But it is a bit unexpected :-)

fit-alessandro-berti commented 3 years ago

Ehi,

Try to use "pm4py.statistics.traces.pandas.case_statistics" instead of that. That requires an event log as input ;)

fmannhardt commented 3 years ago

But I am converting it to an event log here:

log = pm4py.convert_to_event_log(log)
fit-alessandro-berti commented 3 years ago

Ehi, sorry for the misunderstanding. We should perhaps force the activity to be a string. We'll discuss internally and decide how to fix this in the next release

fit-alessandro-berti commented 3 years ago

Dear Felix,

We fixed that in PM4Py 2.2.9

The format_dataframe function now casts the activity column to string