Closed n14b closed 1 year ago
Find the following example. You will need to replace the list of paths that are picked as Parquet files.
import pm4py
import pandas as pd
from collections import Counter
from pm4py.objects.log.util import dataframe_utils
from pm4py.objects.dfg.obj import DFG
paths_to_parquets = ["C:/roadtraffic.parquet", "C:/roadtraffic.parquet"]
overall_paths = Counter()
overall_start_activities = Counter()
overall_end_activities = Counter()
for file_path in paths_to_parquets:
dataframe = pd.read_parquet(file_path)
# use the following if the case ID, activity or timestamp columns in the Parquet
# are not standard
dataframe = pm4py.format_dataframe(dataframe)
paths, start_act, end_act = pm4py.discover_dfg(dataframe)
for pa in paths:
overall_paths[pa] += paths[pa]
for sa in start_act:
overall_start_activities[sa] += start_act[sa]
for ea in end_act:
overall_end_activities[ea] += end_act[ea]
dfg_object = DFG(overall_paths, overall_start_activities, overall_end_activities)
print(overall_paths, overall_start_activities, overall_end_activities)
process_tree = pm4py.discover_process_tree_inductive(dfg_object)
print(process_tree)
petri_net, initial_marking, final_marking = pm4py.convert_to_petri_net(process_tree)
pm4py.view_petri_net(petri_net, initial_marking, final_marking, format="svg")
First, the DFGs for the single Parquets are summed. Then, a process tree is discovered from the DFG object. Then, the process tree is converted to a Petri net and the representation is shown.
can we use heiristic miner instead of inductive ?
We have large dataset divided over multiple files, how can we generate nets for data distributed over multiple files?