pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Obtaining values from heuristic net #364

Closed choyiny closed 1 year ago

choyiny commented 1 year ago

Right now I am able to generate diagrams for both frequency and performance, through the use of heuristic miners like the follows:

from pm4py.objects.petri_net.utils import performance_map
filtered_log = pm4py.filter_variants_top_k(event_log, 10)
heu_net = heuristics_miner.apply_heu(event_log, parameters = {
  heuristics_miner.Variants.CLASSIC.value.Parameters.MIN_DFG_OCCURRENCES: 10
})
gviz = hn_visualizer.apply(heu_net)
hn_visualizer.view(gviz)

However, I also want to extract values from the diagrams. Upon diving into the codebase I realized performance_map might be what I wanted, which can extract traces from the mined log. So I convert my heuristic net to a petri net and ran the following

net, im, fm = heuristics_miner.apply(event_log)
traces = performance_map.get_transition_performance_with_token_replay(event_log, net, im, fm)

I'm expecting a "Dictionary where each transition label is associated to performance measures", as mentioned in the docstring, but getting errors about manipulating pandas dataframe. I was wondering if this particular function is supported since it is not mentioned in the website at all.

Thanks in advance!

fit-alessandro-berti commented 1 year ago

Dear @choyiny

In pm4py 2.3.x we changed the default data structure from EventLog to Pandas dataframes.

However, some utilities still work only with the provision of an event log. We suggest you to convert the dataframe back to an event log after ingestion, e.g.:

import pm4py event_log = pm4py.read_xes("C:/receipt.xes") event_log = pm4py.convert_to_event_log(event_log)

choyiny commented 1 year ago

Thanks for the quick response. I was wondering if there are also utility functions that would allow me to extract transitions the performance map, i.e. from A->B it takes 15 minutes whereas from A->C it takes 30 minutes average?

fit-alessandro-berti commented 1 year ago

Yes, from the traces object, you can do the following. In that case, you get all the times from the placement of the tokens in the preset to the firing of the transition.

for trans in traces: print(" ") print(trans) print(traces[trans]["all_values"])

choyiny commented 1 year ago

all_values will give me a list of durations, but it's not grouped by what the previous transition was. The aggregate number is shown on the heu_net graph, I was wondering if the values in the graph can be easily extracted out?

fit-alessandro-berti commented 1 year ago

Dear @choyiny

With the following example, you could access the performance at the arc level obtained using token-based replay:

from pm4py.algo.conformance.tokenreplay import algorithm as token_replay from pm4py.statistics.variants.log import get as variants_get from pm4py.visualization.petri_net.util import performance_map import pm4py

log = pm4py.read_xes("C:/running-example.xes") log = pm4py.convert_to_event_log(log) net, initial_marking, final_marking = pm4py.discover_petri_net_inductive(log) variants_idx = variants_get.get_variants_from_log_trace_idx(log) variants = variants_get.convert_variants_trace_idx_to_trace_obj(log, variants_idx) parameters_tr = {token_replay.Variants.TOKEN_REPLAY.value.Parameters.VARIANTS: variants} aligned_traces = token_replay.apply(log, net, initial_marking, final_marking, parameters=parameters_tr) element_statistics = performance_map.single_element_statistics(log, net, initial_marking, aligned_traces, variants_idx) print(element_statistics) print(dir(element_statistics)) for el in element_statistics: print(" ") print(el) print(type(el)) print(element_statistics[el])