tuura / process-mining

A library for process mining
Other
8 stars 2 forks source link

Mining circuit models from simulations #1

Open snowleopard opened 8 years ago

snowleopard commented 8 years ago

Consider the following trace obtained from a simulation run of a digital circuit:

a+ -> (b+ || c+) -> a- -> b- -> c- -> a+ -> b+ -> c+ -> a- -> b- -> c-

Note, some of the events are known to be concurrent, because the time difference between them is too small for a proper causality, so essentially this is a labelled partial order.

In general there may be more than one simulation run, possibly activating different behavioural scenarios of the circuit under observation.

Can we use existing process mining tools (e.g., pgminer, genet, ProM or POD) to derive a compact model for the circuit behaviour? If not, what needs to be done?

P.S.: Why is this problem interesting? One possible application is circuit optimisation: start with an existing circuit implementing a particular specification, perform accurate analogue simulations and then build a simplified model of the circuit, hopefully leading to a simpler, more optimal implementation.

The simplification comes from the fact that analogue simulation reveals certain timing relations that exist in the physical system but are not captured in the specification. Such timing relations can often significantly reduce the reachable state space of the circuit.

hernanponcedeleon commented 8 years ago

I was taking a look to what POD would generate for this example (I assume a+ and a- are different labels).

In principle we could not generate this LPO since if b+ and c+ are marked as independent, then the second occurrence of them should also be independent. However one could "hack" POD and just start from this LPO (as Cesar mentioned, one should guarantee certain properties over the LPOs, i.e. they should generate an ES, but this is trivially true for this case).

POD would generate an occurrence net having exactly this configuration and would try to merge conditions and events in order the obtain the final net. There are three modes for merging: sequence-preserving; LPO-preserving and removal aware (to avoid introducing traces marked as forbidden).

Already for this case we see some of the limitations of the tool based on its theory and the fact that it uses an unconditional independence. In order to preserve sequences, we have proven that whenever two events are merged, their preset should also be merged. In this example this means that we obtain a net where a+12 (this is the name I gave to the transition representing events a+1 and a+2) produces to places (representing the first occurrences of b+ and c+ which are concurrent) and the postset of b+12 is merged with the present of c+12 (since there is a condition C between b+2 and c+2 and C must be in the preset of c+12 since we merged c+1 and c+2). Thus, the obtained net is non-safe. Merging the postset of b+12 with the preset of c+12 could have been avoid in this case because not merging them would have not destroy the preservation of the original traces (sequences); again this is a difference due to the fact that in the theory of POD we assume unconditional independence.

If the above was clear, you can see that POD may introduces new behaviors, for example now it is possible to fire twice c+.

My question now is: what is the compact net that we would like to obtain for this case? If we want to specify that the first occurrence should be concurrent while the second ones ordered, then nothing can be merged. If we accept to add more behaviors, we can have the pattern

a+ -> (b+ || c+) -> a- -> b- -> c-

whit a loop.

snowleopard commented 8 years ago

Thanks @hernanponcedeleon!

I think we are exactly after the pattern you give in this case: we have concurrency between b+ and c+, and sequence between b- and c- (not due to causality, but due to relative timing).

Could you please explain how to get this model using POD? (Or is it not currently possible?)

hernanponcedeleon commented 8 years ago

If you are ok with the following:

Each time you mark two activities as concurrent because their time stamps are almost the same, you should mark every occurrences of them as concurrent event if other occurrences are timing causal

Then POD should work. You can instruct the tool to do discovery as follows:

./pod.py discover ./circuit.xes ./circuit.dep --out result.pnml --eq=sp-smt --smt-min-places=7 The last two parameters are used to force preserving traces and a minimal number of places in the final net. Since the SMT solver is too lazy, if you don't use the second parameter it will always return a "flower net", i.e. one place an all transitions consuming/producing it.

Even if this example has a net that would preserve all the independence (the one I mentioned yesterday and I attached), for some reason POD cannot find this net if I use the --eq=ip-smt parameter which is stronger than --eq=sp-smt and forces not only to preserve traces, but also independence. However, in my experience, using --eq=sp-smt and playing with the number of final places, you can normally obtain the net you are looking for (not the best solution, I know).

I attach the trace and the dependence relation I used for this (I had to change their name since guthub forces .txt files) together with the output net (since POD generates pnml files, I used ProM to visualize it).

circuit_dep.txt circuit_xes.txt screen shot 2016-02-10 at 09 56 01

snowleopard commented 8 years ago

@hernanponcedeleon This looks great, we'll give it a try!

Each time you mark two activities as concurrent because their time stamps are almost the same, you should mark every occurrences of them as concurrent event if other occurrences are timing causal

I think this will not work in general, but in this particular example this seems to be fair approach.

In general we probably want conditional dependencies: two events may be concurrent in certain behavioural scenarios, but causally linked in others. This will lead to the problems with theory you mentioned, but we can look at these not as problems, but research opportunities!

hernanponcedeleon commented 8 years ago

As Cesar already mentioned, there might be some cases where still the theory is sound (we are yet not sure how to characterized those cases, but I think is the case where there is a one to one correspondence between the partial orders you start with and the maximal configurations of the ES). For such cases, you could still use POD, however some modifications are needed: either in order to take the partial orders as an input instead of the sequences+independence; or defining how to obtain partial orders from sequences+conditional independence.

snowleopard commented 8 years ago

I see.

At the moment we are working on reliably extracting events from analogue simulation traces, which is a non trivial task (in other words, mining events themselves). Once we have it implemented we'll be able to run several tests to see how well POD handles typical cases.

@hernanponcedeleon We'll keep you all in the loop! Thanks again.