Closed JazminADiaz closed 1 year ago
Dear @JazminADiaz
unwanted_activities should contain activities in the log that are not in the model. In the case that all the activities of the log are in the model, then the result is empty
That is not the case, I'm obtaing unwated activities in another way since I couldn't make the code you provided work, here is how I did it:
`unpredicted_activity_details = []
for trace in log:
attributes = trace.attributes
for event in trace:
activity_name = event['concept:name']
if activity_name not in [transition.label for transition in net_comp.transitions]:
case_id = attributes.get('concept:name', '')
timestamp = event.get('time:timestamp', '')
resource = event.get('org:resource', '')
unpredicted_activity_details.append({
"ID case": case_id,
"Activity": activity_name,
"Time_Stamp": timestamp,
"Resource": resource,
"Event": event
})
` There are plenty of activities that are not wanted, I would pretty much rather to use your code, if you explain what may be happening, I'm happy to provide you any info you need
Dear @JazminADiaz
I have reproduced the problem. In pm4py 2.3.0 we changed the default log format to dataframe. For that method, you still need to make sure to use the EventLog class. You can take a look at the following example, where the read_xes method is used along with the option to get an EventLog back:
import pm4py
from pm4py.algo.conformance.tokenreplay.diagnostics import duration_diagnostics
log = pm4py.read_xes("tests/input_data/receipt.xes", return_legacy_log_object=True)
filtered_log = pm4py.filter_variants_top_k(log, 1)
net, im, fm = pm4py.discover_petri_net_inductive(filtered_log)
replayed_traces, place_fitness, trans_fitness, unwanted_activities = pm4py.conformance_diagnostics_token_based_replay(log, net, im, fm, opt_parameters={"enable_pltr_fitness": True})
print(unwanted_activities)
act_diagnostics = duration_diagnostics.diagnose_from_notexisting_activities(log, unwanted_activities)
print(act_diagnostics)
We will fix it in a more proper way in a next release.
I don't really know how I can use that, I have a csv. I convert it to a dataframe, then I use the log_converter from pm4py, and I use that log, where should I introduce the return_legacy_log_object=True?
Thank you for your answer btw!
I don't know if it helps but with other parts of the code you provide in your page I have had issues becuase the activities have two elementes, a really long id (I'm assuming is an id) and a label, the actual name of the activity, I usually have to add an extraction part of the label in the code to make it work, I don't know if that has to do anything with it but anyway.
Then that is already good. You do not need the part with return_legacy_log_object=True.
To make sure that the columns have the correct typing, for example the timestamp should be a datetime column, you can use pm4py.format_dataframe(.....) which actually ensures that. Check https://pm4py.fit.fraunhofer.de/documentation for the syntax of the command
I haven't really had any issues with the timestamps or anything, your filters that relay on the timestamp work just fine, is just the unwanted activities I haven't been able to make it work
Hi @fit-alessandro-berti, I got the act_diagnostics dict but am not clear with data.
{'Event_X': {'n_containing': 30, 'n_fit': 2163, 'fit_median_time': 3029460.0, 'containing_median_time': 2592000.0, 'relative_throughput': 0.8555980273712147},
'Event_Y': {'n_containing': 11, 'n_fit': 2163, 'fit_median_time': 3029460.0, 'containing_median_time': 2592000.0, 'relative_throughput': 0.8555980273712147},
'Event_Z': {'n_containing': 1, 'n_fit': 2163, 'fit_median_time': 3029460.0, 'containing_median_time': 2592000.0, 'relative_throughput': 0.8555980273712147}}
Can you please clarify me what actually they are representing(n_containing, n_fit, fit_median_time, containing_median_time)
And what was n_fit & n_underfed from trans_diagnostics( diagnose_from_trans_fitness) .
containing traces => the number of cases that contain at least an event with the specified activity fit traces => the number of cases that do NOT contain an event with the specified activity
fit_median_time => among all the cases that are "fit" according to the aforementioned criteria (so without the activity), compute the median time containing_median_time => among all the cases containing one event with the given activity, compute the median time
relative_throughput = containing_median_time / fit_median_time
When the relative_throughput is greater than 1, then the activity leads to an increase of the throughput times. Otherwise, it does not lead to an increase of the throughput times.
Thanks @fit-alessandro-berti
By the way will its applies the same in trans_diagnostics
Yes exactly :)
Hi there, I don't know if I'm doing something wrong, when I check for unwanted activities I get the dictionary, but when I print act_diagnostics, is empy, I'm using the same log I used to create unwanted_activities, please help.
parameters_tbr = {token_based_replay.Variants.TOKEN_REPLAY.value.Parameters.DISABLE_VARIANTS: True, token_based_replay.Variants.TOKEN_REPLAY.value.Parameters.ENABLE_PLTR_FITNESS: True}
act_diagnostics = duration_diagnostics.diagnose_from_notexisting_activities(log, unwanted_activities)
print(act_diagnostics) for act in act_diagnostics: print(act, act_diagnostics[act])