pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Keep case identifier as attribute when converting to a TraceLog #35

Closed fmannhardt closed 5 years ago

fmannhardt commented 5 years ago

It is my understanding while reading the source code that the attribute according to which a TraceLog is built from an EventLog cannot be retrieved after conversion. At least I did not see a way to retrieve it.

If you would keep it in some kind of attribute as part of the log object that would make it easier to convert between the R eventlog and the pm4py TraceLog. I guess it would also facilitate some other operations which rely on a trace being identifiable.

Javert899 commented 5 years ago

Hi,

Thanks for signaling.

Actually, the change you suggested has been implemented in the 'develop' branch, in the pm4py.objects.conversion.log.versions.to_trace_log.apply method, changing "include_case_attributes=" to True from False

fmannhardt commented 5 years ago

Thanks. I assume you mean the changes done in 27577e1a446ebf43e807e34236703aadc171fdcc?

However, what I meant is that it is impossible to know which attributes was used as trace identifier or case_glue as you denote it in PM4PY after having obtained the TraceLog. There might be multiple case attributes so it would not be clear (without computation) which one was used.

Javert899 commented 5 years ago

Hi,

In the 'develop' branch now, when the trace log is converted to event log, and the include_case_attributes is (as default) True, then it is checked that the attribute "concept:name" is present, and if not is added considering the case id glue

A.k.a. the code changes as follows:

        if include_case_attributes:
            for k in event.keys():
                if k.startswith(case_attribute_prefix):
                    trace_attr[k.replace(case_attribute_prefix, '')] = event[k]
            if xes.DEFAULT_TRACEID_KEY not in trace_attr:
                trace_attr[xes.DEFAULT_TRACEID_KEY] = trace_attr[case_glue]
fmannhardt commented 5 years ago

Just trying this with the new v1.0.19 and now I get this error when trying to convert an EventLog with case_id patient to a TraceLog:

 Error in py_call_impl(callable, dots$args, dots$keywords) : 
   KeyError: 'patient'

 Detailed traceback: 
   File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\objects\conversion\log\factory.py", line 14, in apply
     return VERSIONS[variant](log, parameters=parameters)
   File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\objects\conversion\log\versions\to_trace_log.py", line 26, in apply
     case_attribute_prefix=case_pref, enable_deepcopy=enable_deepcopy)
   File "C:\Users\felixm\AppData\Local\CONTIN~1\ANACON~1\envs\R-RETI~1\lib\site-packages\pm4py\objects\conversion\log\versions\to_trace_log.py", line 66, in transform_event_log_to_trace_log
     trace_attr[xes.DEFAULT_TRACEID_KEY] = trace_attr[case_glue]

It seems that you assume that all case attributes have some kind of prefix? Could this limitation be dropped? In many cases attributes have a domain specific meaning and it would be weird having to rename it. It also prevents me from converting an R event log to PM4PY an back without changes.

Edit: Just to add, for my use case on converting from bupaR to PM4PY and back I can workaround this issue by simply adding an artificial attribute and not converting trace attributes. So, the comment is more on the general PM4PY design choice. :)

Javert899 commented 5 years ago

Hi,

Sorry for that. The bug you are seeing is obviously an error introduced by my side on last release. Indeed, it should be trace_attr[xes.DEFAULT_TRACEID_KEY] = glue

I've pushed this hotfix into release 1.0.20

For the remainder of your observations, there could be two options: