pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

pm4py creates key from xes format even when using csv file as source #152

Closed bwanaaa closed 4 years ago

bwanaaa commented 4 years ago

Windows 10 pro, anaconda latest I successfully installed pm4py (as well as graphviz) and got it to work with a sample xes file. However, when using a csv or xls file, the created log has an unusual entry .The first few lines of the csv file I used look like this:

Case ID,Event_ID,dd-MM-yyyy:HH.mm,Activity,Resource,Costs
1,35654423,30-12-2010:11.02,register request,Pete,50
1,35654424,31-12-2010:10.06,examine thoroughly,Sue,400
1,35654425,05-01-2011:15.12,check ticket,Mike,100
1,35654426,06-01-2011:11.18,decide,Sara,200
1,35654427,07-01-2011:14.24,reject request,Pete,200

This code successfully creates the log file:

from pm4py.objects.log.adapters.pandas import csv_import_adapter
from pm4py.objects.conversion.log import factory as conversion_factory
from pm4py.util import constants
from pm4py.algo.discovery.alpha import factory as alpha_miner
from pm4py.visualization.petrinet import factory as vis_factory

dataframe = csv_import_adapter.import_dataframe_from_path("running-example.csv")
log = conversion_factory.apply(dataframe, parameters={constants.PARAMETER_CONSTANT_CASEID_KEY: "Case ID",
constants.PARAMETER_CONSTANT_ACTIVITY_KEY: "Activity",
constants.PARAMETER_CONSTANT_TIMESTAMP_KEY: "dd-MM-yyyy"} )

However the next line

net, initial_marking, final_marking = alpha_miner.apply(log)

gives this error>

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-2b1be879bcb9> in <module>
----> 1 net, initial_marking, final_marking = alpha_miner.apply(log)

~\Anaconda3\envs\pm4\lib\site-packages\pm4py\algo\discovery\alpha\factory.py in apply(log, parameters, variant)
     52                                               timestamp_key=parameters[pmutil.constants.PARAMETER_CONSTANT_TIMESTAMP_KEY])
     53             return VERSIONS_DFG[variant](dfg, parameters=parameters)
---> 54     return VERSIONS[variant](log_conversion.apply(log, parameters, log_conversion.TO_EVENT_LOG), parameters)
     55 
     56 

~\Anaconda3\envs\pm4\lib\site-packages\pm4py\algo\discovery\alpha\versions\classic.py in apply(log, parameters)
     60     if pm_util.constants.PARAMETER_CONSTANT_ACTIVITY_KEY not in parameters:
     61         parameters[pm_util.constants.PARAMETER_CONSTANT_ACTIVITY_KEY] = pm_util.xes_constants.DEFAULT_NAME_KEY
---> 62     dfg = {k: v for k, v in dfg_inst.apply(log, parameters=parameters).items() if v > 0}
     63     start_activities = endpoints.derive_start_activities_from_log(log, parameters[
     64         pm_util.constants.PARAMETER_CONSTANT_ACTIVITY_KEY])

~\Anaconda3\envs\pm4\lib\site-packages\pm4py\algo\discovery\dfg\versions\native.py in apply(log, parameters)
     19         DFG graph
     20     """
---> 21     return log_calc.native(log, parameters=parameters)

~\Anaconda3\envs\pm4\lib\site-packages\pm4py\objects\dfg\retrieval\log.py in native(log, parameters)
     65     activity_key = parameters[pmutil.constants.PARAMETER_CONSTANT_ACTIVITY_KEY]
     66     dfgs = map((lambda t: [(t[i - window][activity_key], t[i][activity_key]) for i in range(window, len(t))]), log)
---> 67     return Counter([dfg for lista in dfgs for dfg in lista])
     68 
     69 

~\Anaconda3\envs\pm4\lib\site-packages\pm4py\objects\dfg\retrieval\log.py in <listcomp>(.0)
     65     activity_key = parameters[pmutil.constants.PARAMETER_CONSTANT_ACTIVITY_KEY]
     66     dfgs = map((lambda t: [(t[i - window][activity_key], t[i][activity_key]) for i in range(window, len(t))]), log)
---> 67     return Counter([dfg for lista in dfgs for dfg in lista])
     68 
     69 

~\Anaconda3\envs\pm4\lib\site-packages\pm4py\objects\dfg\retrieval\log.py in <lambda>(t)
     64     window = parameters[WINDOW] if WINDOW in parameters else DEFAULT_WINDOW
     65     activity_key = parameters[pmutil.constants.PARAMETER_CONSTANT_ACTIVITY_KEY]
---> 66     dfgs = map((lambda t: [(t[i - window][activity_key], t[i][activity_key]) for i in range(window, len(t))]), log)
     67     return Counter([dfg for lista in dfgs for dfg in lista])
     68 

~\Anaconda3\envs\pm4\lib\site-packages\pm4py\objects\dfg\retrieval\log.py in <listcomp>(.0)
     64     window = parameters[WINDOW] if WINDOW in parameters else DEFAULT_WINDOW
     65     activity_key = parameters[pmutil.constants.PARAMETER_CONSTANT_ACTIVITY_KEY]
---> 66     dfgs = map((lambda t: [(t[i - window][activity_key], t[i][activity_key]) for i in range(window, len(t))]), log)
     67     return Counter([dfg for lista in dfgs for dfg in lista])
     68 

KeyError: 'concept:name'

Where does it get this key? I never specified it nor was it in the csv file. I tried to save the log file in xes format and reimport it as xes but that fails as well (since the key names for xes are hard coded)

fit-alessandro-berti commented 4 years ago

Dear bwanaaa,

The same parameters that you use for the conversion

parameters={constants.PARAMETER_CONSTANT_CASEID_KEY: "Case ID", constants.PARAMETER_CONSTANT_ACTIVITY_KEY: "Activity", constants.PARAMETER_CONSTANT_TIMESTAMP_KEY: "dd-MM-yyyy"}

Should also be used with the Alpha Miner

bwanaaa commented 4 years ago

Thank you. For anyone else as dense as me, I got it to work with this code:

from pm4py.objects.log.adapters.pandas import csv_import_adapter from pm4py.objects.conversion.log import factory as conversion_factory from pm4py.util import constants from pm4py.algo.discovery.alpha import factory as alpha_miner from pm4py.visualization.petrinet import factory as vis_factory

dataframe = csv_import_adapter.import_dataframe_from_path("running-example.csv") parameters1={constants.PARAMETER_CONSTANT_CASEID_KEY: "Case ID", constants.PARAMETER_CONSTANT_ACTIVITY_KEY: "Activity", constants.PARAMETER_CONSTANT_TIMESTAMP_KEY: "dd-MM-yyyy"}

log = conversion_factory.apply(dataframe, parameters=parameters1) net, initial_marking, final_marking = alpha_miner.apply(log,parameters=parameters1) gviz = vis_factory.apply(net, initial_marking, final_marking) vis_factory.view(gviz)