How to come to interval logs (as compared to lifecycle logs) for PM4Py

pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.

https://pm4py.fit.fraunhofer.de

GNU General Public License v3.0

722 stars 286 forks source link

How to come to interval logs (as compared to lifecycle logs) for PM4Py #338

Closed jbdatascience closed 2 years ago

jbdatascience commented 2 years ago

This is not an issue but more of a question: How to come to interval logs (as compared to lifecycle logs for PM4Py).

As I see it now, PM4Py needs at least 1 timestamp in an event log to be able to produce Process Models.

But if you want more details (for deeper statistical analysis such as bottle necks, thoughput etc) then I think you will need interval logs with timestamps for both the start and end of an activity. My questions is: how do you specifiy these start and end time stamps in the interval event log itself and in the PM4Py functions that can discover process models and in the functions for getting all the statistics out?

fit-alessandro-berti commented 2 years ago

Dear jbdatascience,

You can specify the start timestamp for the methods using the parameter pm4py:param:start_timestamp_key

Example for the calculation and the visualization of the performance DFG:

**import pm4py

log = pm4py.read_xes("tests/input_data/interval_event_log.xes")

from pm4py.algo.discovery.dfg import algorithm as performance_dfg_discovery

perf_dfg = performance_dfg_discovery.apply(log, parameters={"pm4py:param:start_timestamp_key": "start_timestamp", "pm4py:param:timestamp_key": "time:timestamp"}) start_activities = pm4py.get_start_activities(log) end_activiites = pm4py.get_end_activities(log)

pm4py.view_performance_dfg(perf_dfg, start_activities, end_activiites, format="svg")**

s-j-v-zelst commented 2 years ago

Dear @jbdatascience, additionally to Alessandro's reply, I would like to point out that 'full interval support' is one of our mid-term goals for pm4py. In the future, any method that can be invoked by pm4py.function(log), should have the ability to specify both a start and end time stamp column. However, note that adopting the existing process mining algorithm to time intervals is far from trivial (and not at all solved completely).