pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

`filter_eventually_follows_relation` function giving unexpected results #342

Closed cpitsch closed 2 years ago

cpitsch commented 2 years ago

Hello,

While I was working with the filter_eventually_follows_relation function, I encountered some behavior I did not expect:

First and foremost: I have tested this on the newest PM4Py version: 2.2.24

If I have an event log containing the traces [<a,b,c,d>, <a,c,b,e>, <a,b,c,e>], and filter to only retain the eventually follows relation c⟶d, I would expect an empty event log to result from this, as d only occurs at the end of traces. However, the resulting event log is [<a,b,c,d>], which I did not expect.

I filtered the event log like this:

filtered_log = pm4py.filter_eventually_follows_relation(log, [("d", "c")])

Reversing the tuple to the following also yields the same result:

filtered_log = pm4py.filter_eventually_follows_relation(log, [("c", "d")])

Upon further investigation, I found that (for this small example) for every pair of activities, the length of the filtered event log was always the same regardless of the "order" of the given tuple. This implies that perhaps the eventually-follows relation is seen as a symmetric relation. Is this the case? Or are the results unexpected?

Reproducible Example

fit-alessandro-berti commented 2 years ago

Thank you for signaling.

Our eventually-follows function for EventLog objects is broken, we will release a fix for this in the next release

fit-alessandro-berti commented 2 years ago

The issue has been solved in PM4Py 2.2.25

cpitsch commented 2 years ago

Thank you for the quick fix!

I have one more question about this function:

With this implementation in PM4PY 2.2.25, the eventually-follows relation seems "reflexive": Filtering my example log from above for the eventually-follows relation d⟶d retains the trace <a,b,c,d>.
To some extent, this makes sense, as d follows d after 0 events. However, to me, a filtering requiring a non-zero distance seems more useful.

Is this "reflexivity" intentional?