pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

min_act_count not working properly #73

Closed Luttik closed 5 years ago

Luttik commented 5 years ago

I am trying to analyse the 2019 BPI challenge data with pm4py and I found that min_act_count does not actually filter the activities properly.

With the code below my min_act_count is 176213.8, but it still shows activities with just hundreds of occurrences.

log = xes_importer.apply('bpi challenge 2019.xes')
parameters = dict(dependency_thresh=0.9, loops_length_two_thresh=.9, min_dfg_occurrences=.5*len(log), dfg_pre_cleaning_noise_thresh=.7, min_act_count=.7*len(log), and_measure_thresh=.4)
heu_net = heuristics_miner.apply_heu(log, parameters=parameters)
gviz = hn_vis_factory.apply(heu_net)
hn_vis_factory.view(gviz)
Luttik commented 5 years ago

Maybe I am just having an issue interpreting this. This is one of the nodes that I have, the number in the node does not seem to correspond with the incoming and outgoing edges.

image

Javert899 commented 5 years ago

Thank you for signaling

Actually some verification will go on, to see if it is a decoration or calculation error

To avoid the issue, you could preprocess the log to the most common activities by using the attributes filter auto filter

Since it is not a critical bug (hampering the execution of a method), the release of a patch will first take place in the 'develop' branch and moved into release according to the normal release schedule

Javert899 commented 5 years ago

Some fixes for this were pushed in the develop branch.

Actually, the checks were done only by arcs added through the dependency threshold.

For loops of length 2, the check on min_act_count was not done, this caused the problem in your case.

We already knew that the sum of the arcs was not going to be always equal to the number of occurrences of the activity (indeed, it may be that the trace contains only that activity, or some incoming/outgoing arcs do not overcome the dependency threshold), but in your case it is different, and some further checks will be done

These kind of glitches are difficult to find with automatic tests, so your contribution has been/is very precious to increase the quality of the library