Closed serkserk closed 4 years ago
Dear Serkserk,
The problem with the sorting on timestamps occurs if you have several events having the same timestamp.
You need to define an index column in the dataframe, and use that as secondary attribute for the sort_values. Example:
df["@@index"] = df.index df = df.sort_values(["time:timestamp", "@@index"])
Case (with duplicate timestamp/row) that does not have the events "2- En vivier" followed by "4- Contractualisé" affect this edge frequency, is this normal ?
Nice guys
On 10 Jul 2020, at 16.49, Serkan notifications@github.com wrote:
Case that does not have the events "2- En vivier" followed by "4- Contractualisé" affect this edge frequency, is this normal ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pm4py/pm4py-source/issues/166#issuecomment-656715502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOWNFR5HYF6ELW3TN45G5PLR24S6TANCNFSM4OVVMKGQ.
Yes, duplicate timestamps can be mismanaged by the sort operation, hence the need to sort with double key
I have an issue where depending on the ordering of my data, I will have various frequency value with a DFG
You can see on this notebook : https://gist.github.com/serkserk/9b8e7539e72576ff49d740abf41040b7
I first use my data without any ordering and for example, the edge "2- En vivier, 4- Contractualisé" count is 6 wich is correct Then I try with pm4py util but got 14 wichh is wrong (and same with my custom ordering)