pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Filtering on variants #58

Closed andionita closed 5 years ago

andionita commented 5 years ago

Regarding filtering on variants how does the decreasing factor play a role in the apply_auto_filter method?

For example:

xes_file_path = os.path.join('pm4py-source', 'tests', 'input_data', 'receipt.xes')
trace_log = xes_importer.import_log(xes_file_path)
trace_log = sorting.sort_timestamp(trace_log)
trace_log = variants_filter.apply_auto_filter(trace_log)
variants_count = case_statistics.get_variant_statistics(trace_log)
variants_count = sorted(variants_count, key = lambda x : x['count'], reverse=True)

Before filtering, the variant counts were: 713, 123, 116, ..., 12, 10, 10, ..., 1.

After filtering, the most common are indeed kept, as it says in the documentation: 713, 123, 116, ..., 12.

The decreasing factor is explained in the context of filtering out the most frequent items in a multiset. How is it applied here, when removing the less frequent items and why is the cutoff at 12 in this example? Thanks.

andionita commented 5 years ago

Tested on Python 3.6.5, Ubuntu 16.04, pm4py version 1.0.23 / commit e0f408a

Javert899 commented 5 years ago

Hi,

Thanks for signaling. The behavior of decreasingFactor in variants filter was not correct, as you would expect only to keep the first variant with a decreasingFactor of 0.6

Problem resolved in release 1.0.24