pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Alignment is missing names of invisible transitions #40

Closed fmannhardt closed 5 years ago

fmannhardt commented 5 years ago

The alignments returned by PM4PY use the label of the transitions instead of its identifier/name in the resulting list: https://github.com/pm4py/pm4py-source/blob/afa8d9d1eb1a87ee57a2ba3a75d7d292963590ff/pm4py/algo/conformance/alignments/versions/state_equation_a_star.py#L185

This leads to invisible transitions appearing as None in the result, for example:

{'alignment': [('>>', None), 
('Registration', 'Registration'),
 ('Triage and Assessment', 'Triage and Assessment'), 
('X-Ray', 'X-Ray'), 
('Discuss Results', 'Discuss Results'), 
('Check-out', 'Check-out'), 
('>>', None), 
('Registration', 'Registration'), 
('Triage and Assessment', 'Triage and Assessment'), 
('X-Ray', 'X-Ray'), 
('Discuss Results', 'Discuss Results'),
 ('Check-out', 'Check-out'), 
('>>', None)], 'cost': 3, 'visited_states': 15, 'queued_states': 43, 'traversed_arcs': 50, 'fitness': 1.0}

This makes it hard to use the alignment for visualization and or analysis purposes. Also, when having transitions with duplicate labels some computation would be required to infer which one was executed.

Could you simply use the name instead of the label? Since the label can then be looked up.

Javert899 commented 5 years ago

You are opening the gates of hell :)

The issue you signal exist, both for invisible transitions (from which, given the None returned, it is impossible to guess which transitions have been fired) and visible transitions with duplicate labels.

A possible fix of this has been proposed in the 'proposedFixAlignments' branch. But before describing the fix and its effects, I shall recap the causes of this.

Essentially, to speed up the alignments process, Python multiprocessing has been called in action. So, each subprocess can compute the alignment of a trace and then results are gathered. We found impossible to return from the subprocess the state object as-is at the end of the computation, because of pickle recursion issues. So, we shall return a list of simpler objects (a.k.a. null objects or string) to describe the alignment.

Now I can describe the fix. In the 'proposedFixAlignments' branch, in the state_equation_a_star now the following function is called to get a label for the move:

def get_label_for_align_reconstruction(trans): if trans.label[1] is None:

invisible transitions

    return (trans.label[0],
            PARAMETER_ALIGN_INVISIBLE_TRANS_PREFIX + trans.name[1] + PARAMETER_ALIGN_INVISIBLE_TRANS_SUFFIX)
else:
    if trans.label[1] == ">>":
        return trans.label
    # visible transitions
    return (trans.label[0], trans.label[1] + PARAMETER_ALIGN_VISIBLE_TRANS_PREFIX + trans.name[
        1] + PARAMETER_ALIGN_INVISIBLE_TRANS_SUFFIX)

This works returning: -for visible transitions, a string starting with the transition label and containing, in the final parenthesis, the transition name

Let me know what you think about this fix. We will continue discussion, both internally and with you, before merging or ditching this fix. After discussions, we could change the fix to something different.

It was not possible to implement the straightforward fix you proposed, essentially because in the list of moves the transition name may be an unmeaningful code, making it impossible to guess if it is a visible or invisible transition.

Example of alignments (given this fix) of a sampled Receipt log on a very simple Petri net obtained by Inductive Miner on a filtered version of the Receipt log:

0 ['Confirmation of receipt', 'T02 Check confirmation of receipt', 'T04 Determine confirmation of receipt', 'T05 Print and send confirmation of receipt', 'T06 Determine necessity of stop advice', 'T10 Determine necessity to stop indication'] ('Confirmation of receipt', 'Confirmation of receipt (Confirmation of receipt)') ('T02 Check confirmation of receipt', 'T02 Check confirmation of receipt (T02 Check confirmation of receipt)') ('T04 Determine confirmation of receipt', 'T04 Determine confirmation of receipt (T04 Determine confirmation of receipt)') ('T05 Print and send confirmation of receipt', 'T05 Print and send confirmation of receipt (T05 Print and send confirmation of receipt)') ('T06 Determine necessity of stop advice', 'T06 Determine necessity of stop advice (T06 Determine necessity of stop advice)') ('T10 Determine necessity to stop indication', 'T10 Determine necessity to stop indication (T10 Determine necessity to stop indication)') ('>>', '[INVISIBLE] (tau_1)') isFit = True

1 ['Confirmation of receipt', 'T06 Determine necessity of stop advice', 'T10 Determine necessity to stop indication', 'T02 Check confirmation of receipt', 'T04 Determine confirmation of receipt', 'T05 Print and send confirmation of receipt'] ('Confirmation of receipt', 'Confirmation of receipt (Confirmation of receipt)') ('T06 Determine necessity of stop advice', '>>') ('T10 Determine necessity to stop indication', '>>') ('T02 Check confirmation of receipt', 'T02 Check confirmation of receipt (T02 Check confirmation of receipt)') ('T04 Determine confirmation of receipt', 'T04 Determine confirmation of receipt (T04 Determine confirmation of receipt)') ('T05 Print and send confirmation of receipt', 'T05 Print and send confirmation of receipt (T05 Print and send confirmation of receipt)') ('>>', 'T06 Determine necessity of stop advice (T06 Determine necessity of stop advice)') ('>>', 'T10 Determine necessity to stop indication (T10 Determine necessity to stop indication)') ('>>', '[INVISIBLE] (tau_1)') isFit = False

2 ['Confirmation of receipt', 'T02 Check confirmation of receipt', 'T04 Determine confirmation of receipt', 'T05 Print and send confirmation of receipt', 'T06 Determine necessity of stop advice', 'T10 Determine necessity to stop indication'] ('Confirmation of receipt', 'Confirmation of receipt (Confirmation of receipt)') ('T02 Check confirmation of receipt', 'T02 Check confirmation of receipt (T02 Check confirmation of receipt)') ('T04 Determine confirmation of receipt', 'T04 Determine confirmation of receipt (T04 Determine confirmation of receipt)') ('T05 Print and send confirmation of receipt', 'T05 Print and send confirmation of receipt (T05 Print and send confirmation of receipt)') ('T06 Determine necessity of stop advice', 'T06 Determine necessity of stop advice (T06 Determine necessity of stop advice)') ('T10 Determine necessity to stop indication', 'T10 Determine necessity to stop indication (T10 Determine necessity to stop indication)') ('>>', '[INVISIBLE] (tau_1)') isFit = True

3 ['Confirmation of receipt', 'T02 Check confirmation of receipt', 'T04 Determine confirmation of receipt', 'T05 Print and send confirmation of receipt', 'T06 Determine necessity of stop advice', 'T10 Determine necessity to stop indication'] ('Confirmation of receipt', 'Confirmation of receipt (Confirmation of receipt)') ('T02 Check confirmation of receipt', 'T02 Check confirmation of receipt (T02 Check confirmation of receipt)') ('T04 Determine confirmation of receipt', 'T04 Determine confirmation of receipt (T04 Determine confirmation of receipt)') ('T05 Print and send confirmation of receipt', 'T05 Print and send confirmation of receipt (T05 Print and send confirmation of receipt)') ('T06 Determine necessity of stop advice', 'T06 Determine necessity of stop advice (T06 Determine necessity of stop advice)') ('T10 Determine necessity to stop indication', 'T10 Determine necessity to stop indication (T10 Determine necessity to stop indication)') ('>>', '[INVISIBLE] (tau_1)') isFit = True

4 ['Confirmation of receipt', 'T02 Check confirmation of receipt', 'T04 Determine confirmation of receipt', 'T05 Print and send confirmation of receipt', 'T06 Determine necessity of stop advice', 'T10 Determine necessity to stop indication'] ('Confirmation of receipt', 'Confirmation of receipt (Confirmation of receipt)') ('T02 Check confirmation of receipt', 'T02 Check confirmation of receipt (T02 Check confirmation of receipt)') ('T04 Determine confirmation of receipt', 'T04 Determine confirmation of receipt (T04 Determine confirmation of receipt)') ('T05 Print and send confirmation of receipt', 'T05 Print and send confirmation of receipt (T05 Print and send confirmation of receipt)') ('T06 Determine necessity of stop advice', 'T06 Determine necessity of stop advice (T06 Determine necessity of stop advice)') ('T10 Determine necessity to stop indication', 'T10 Determine necessity to stop indication (T10 Determine necessity to stop indication)') ('>>', '[INVISIBLE] (tau_1)') isFit = True

5 ['Confirmation of receipt', 'T02 Check confirmation of receipt', 'T06 Determine necessity of stop advice', 'T04 Determine confirmation of receipt', 'T05 Print and send confirmation of receipt', 'T10 Determine necessity to stop indication'] ('Confirmation of receipt', 'Confirmation of receipt (Confirmation of receipt)') ('T02 Check confirmation of receipt', 'T02 Check confirmation of receipt (T02 Check confirmation of receipt)') ('T06 Determine necessity of stop advice', '>>') ('T04 Determine confirmation of receipt', 'T04 Determine confirmation of receipt (T04 Determine confirmation of receipt)') ('T05 Print and send confirmation of receipt', 'T05 Print and send confirmation of receipt (T05 Print and send confirmation of receipt)') ('>>', 'T06 Determine necessity of stop advice (T06 Determine necessity of stop advice)') ('T10 Determine necessity to stop indication', 'T10 Determine necessity to stop indication (T10 Determine necessity to stop indication)') ('>>', '[INVISIBLE] (tau_1)') isFit = False

fmannhardt commented 5 years ago

Thanks.

Just wondering, why would it not be possible to return an object, for example, dict with that information? Or is it strictly strings that can be returned due to the multi-threading issue? If so, would it be a possibility to simply return the name, which is unique and then re-build the correct data structure on the main thread by looking up the rest of the information in the Petri net?

I don't understand the comment:

It was not possible to implement the straightforward fix you proposed, essentially because in the list of moves the transition name may be an unmeaningful code, making it impossible to guess if it is a visible or invisible transition.

The name's of transitions have to be unique, right? So to know whether it was invisible or not is to simply look it up in the Petri net?

Javert899 commented 5 years ago

Hi,

Eventually, a fix has been included in the 'develop' branch.

Instead of changing the default behavior, a parameter "ret_tuple_as_trans_desc" could be passed to the alignments to return, in the description of the alignment, for each transition of the product net a tuple ((trans_name_tracenet, trans_name_model), (trans_label_tracenet, trans_label_model))

TOY EXAMPLE

WITH ret_tuple_as_trans_desc FALSE:

log = xes_importer.apply("C:\running-example.xes") net, im, fm = inductive_miner.apply(log) aligned_traces = align_factory.apply(log, net, im, fm, parameters={"ret_tuple_as_trans_desc": False}) print(aligned_traces)

[{'alignment': [('register request', 'register request'), ('examine casually', 'examine casually'), ('check ticket', 'check ticket'), ('decide', 'decide'), ('reinitiate request', 'reinitiate request'), ('>>', None), ('examine thoroughly', 'examine thoroughly'), ('check ticket', 'check ticket'), ('decide', 'decide'), ('>>', None), ('pay compensation', 'pay compensation'), ('>>', None)], 'cost': 3, 'visited_states': 15, 'queued_states': 45, 'traversed_arcs': 54, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('examine casually', 'examine casually'), ('>>', None), ('decide', 'decide'), ('>>', None), ('pay compensation', 'pay compensation'), ('>>', None)], 'cost': 5, 'visited_states': 12, 'queued_states': 31, 'traversed_arcs': 38, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('examine thoroughly', 'examine thoroughly'), ('check ticket', 'check ticket'), ('decide', 'decide'), ('>>', None), ('reject request', 'reject request'), ('>>', None)], 'cost': 2, 'visited_states': 8, 'queued_states': 23, 'traversed_arcs': 25, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('examine casually', 'examine casually'), ('check ticket', 'check ticket'), ('decide', 'decide'), ('>>', None), ('pay compensation', 'pay compensation'), ('>>', None)], 'cost': 2, 'visited_states': 8, 'queued_states': 22, 'traversed_arcs': 25, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('examine casually', 'examine casually'), ('check ticket', 'check ticket'), ('decide', 'decide'), ('reinitiate request', 'reinitiate request'), ('>>', None), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('examine casually', 'examine casually'), ('>>', None), ('decide', 'decide'), ('reinitiate request', 'reinitiate request'), ('>>', None), ('examine casually', 'examine casually'), ('check ticket', 'check ticket'), ('decide', 'decide'), ('>>', None), ('reject request', 'reject request'), ('>>', None)], 'cost': 7, 'visited_states': 28, 'queued_states': 76, 'traversed_arcs': 102, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('examine thoroughly', 'examine thoroughly'), ('>>', None), ('decide', 'decide'), ('>>', None), ('reject request', 'reject request'), ('>>', None)], 'cost': 5, 'visited_states': 12, 'queued_states': 30, 'traversed_arcs': 38, 'fitness': 1.0}]

WITH ret_tuple_as_trans_desc TRUE:

log = xes_importer.apply("C:\running-example.xes") net, im, fm = inductive_miner.apply(log) aligned_traces = align_factory.apply(log, net, im, fm, parameters={"ret_tuple_as_trans_desc": True}) print(aligned_traces)

[{'alignment': [(('t0', 'register request'), ('register request', 'register request')), (('t1', 'examine casually'), ('examine casually', 'examine casually')), (('t2', 'check ticket'), ('check ticket', 'check ticket')), (('t3', 'decide'), ('decide', 'decide')), (('t4', 'reinitiate request'), ('reinitiate request', 'reinitiate request')), (('>>', 'loop_3'), ('>>', None)), (('t5', 'examine thoroughly'), ('examine thoroughly', 'examine thoroughly')), (('t6', 'check ticket'), ('check ticket', 'check ticket')), (('t7', 'decide'), ('decide', 'decide')), (('>>', 'skip_11'), ('>>', None)), (('t8', 'pay compensation'), ('pay compensation', 'pay compensation')), (('>>', 'tau_1'), ('>>', None))], 'cost': 3, 'visited_states': 15, 'queued_states': 45, 'traversed_arcs': 54, 'fitness': 1.0}, {'alignment': [(('t0', 'register request'), ('register request', 'register request')), (('>>', 'skip_7'), ('>>', None)), (('t1', 'check ticket'), ('check ticket', 'check ticket')), (('>>', 'loop_5'), ('>>', None)), (('t2', 'examine casually'), ('examine casually', 'examine casually')), (('>>', 'skip_8'), ('>>', None)), (('t3', 'decide'), ('decide', 'decide')), (('>>', 'skip_11'), ('>>', None)), (('t4', 'pay compensation'), ('pay compensation', 'pay compensation')), (('>>', 'tau_1'), ('>>', None))], 'cost': 5, 'visited_states': 12, 'queued_states': 31, 'traversed_arcs': 38, 'fitness': 1.0}, {'alignment': [(('t0', 'register request'), ('register request', 'register request')), (('t1', 'examine thoroughly'), ('examine thoroughly', 'examine thoroughly')), (('t2', 'check ticket'), ('check ticket', 'check ticket')), (('t3', 'decide'), ('decide', 'decide')), (('>>', 'skip_11'), ('>>', None)), (('t4', 'reject request'), ('reject request', 'reject request')), (('>>', 'tau_1'), ('>>', None))], 'cost': 2, 'visited_states': 8, 'queued_states': 21, 'traversed_arcs': 25, 'fitness': 1.0}, {'alignment': [(('t0', 'register request'), ('register request', 'register request')), (('t1', 'examine casually'), ('examine casually', 'examine casually')), (('t2', 'check ticket'), ('check ticket', 'check ticket')), (('t3', 'decide'), ('decide', 'decide')), (('>>', 'skip_11'), ('>>', None)), (('t4', 'pay compensation'), ('pay compensation', 'pay compensation')), (('>>', 'tau_1'), ('>>', None))], 'cost': 2, 'visited_states': 8, 'queued_states': 22, 'traversed_arcs': 25, 'fitness': 1.0}, {'alignment': [(('t0', 'register request'), ('register request', 'register request')), (('t1', 'examine casually'), ('examine casually', 'examine casually')), (('t2', 'check ticket'), ('check ticket', 'check ticket')), (('t3', 'decide'), ('decide', 'decide')), (('t4', 'reinitiate request'), ('reinitiate request', 'reinitiate request')), (('>>', 'loop_3'), ('>>', None)), (('>>', 'skip_7'), ('>>', None)), (('t5', 'check ticket'), ('check ticket', 'check ticket')), (('>>', 'loop_5'), ('>>', None)), (('t6', 'examine casually'), ('examine casually', 'examine casually')), (('>>', 'skip_8'), ('>>', None)), (('t7', 'decide'), ('decide', 'decide')), (('t8', 'reinitiate request'), ('reinitiate request', 'reinitiate request')), (('>>', 'loop_3'), ('>>', None)), (('t9', 'examine casually'), ('examine casually', 'examine casually')), (('t10', 'check ticket'), ('check ticket', 'check ticket')), (('t11', 'decide'), ('decide', 'decide')), (('>>', 'skip_11'), ('>>', None)), (('t12', 'reject request'), ('reject request', 'reject request')), (('>>', 'tau_1'), ('>>', None))], 'cost': 7, 'visited_states': 28, 'queued_states': 76, 'traversed_arcs': 102, 'fitness': 1.0}, {'alignment': [(('t0', 'register request'), ('register request', 'register request')), (('>>', 'skip_7'), ('>>', None)), (('t1', 'check ticket'), ('check ticket', 'check ticket')), (('>>', 'loop_5'), ('>>', None)), (('t2', 'examine thoroughly'), ('examine thoroughly', 'examine thoroughly')), (('>>', 'skip_8'), ('>>', None)), (('t3', 'decide'), ('decide', 'decide')), (('>>', 'skip_11'), ('>>', None)), (('t4', 'reject request'), ('reject request', 'reject request')), (('>>', 'tau_1'), ('>>', None))], 'cost': 5, 'visited_states': 12, 'queued_states': 30, 'traversed_arcs': 38, 'fitness': 1.0}]

fmannhardt commented 5 years ago

Thanks, might be some overhead since the label is not really needed when the name/id is returned but better than before.

s-j-v-zelst commented 5 years ago

We have replaced the string by a variable in file state_equation_a_star:

PARAM_ALIGNMENT_RESULT_IS_SYNC_PROD_AWARE