Improve inter-algorithm reproducibility in automata library

Heizmann commented 7 years ago

Motivation of the problem Let's assume we have two algorithms that both produce automaton as a result. The automata are conceptually identical but their data structure store elements in different orders. The first algorithm added the transition (q_45, a, q _23) and then the transition (q_45, a, q _42), the second algorithm added both transitions in the different order. Now, subsequent algorithms may behave completely on the first automaton than on the second automaton (e.g., the order in which the state space is explored might be completely different).

Some solution Let the type parameter LETTER and STATE implement the Comparable interface. Use TreeMaps and TreeSets (instead of HashMaps and HashSets) whenever we iterate over Maps/Sets.

Problems

Less flexibility
TreeMaps/TreeSets are more expensive
Difficult to check for developers at which positions you have to use TreeMaps/TreeSets
Difficult to enforce that developers use the right kind of Map.
All our advanced Map/Set data structures (e.g., NestedMaps, HashRelation, ScopedHashMep) need a Tree version
Complicated if only one component of the NestedMap needs to a TreeSet

I do not want to implement that soon. I cannot overlook all consequences. I just wanted to start a discussion to get more feedback.

guluchen commented 7 years ago

TreeSets can be fast when the elements are sparse and one wants to compare subset relation between them.

danieldietsch commented 7 years ago

Questions:

Why is it less flexible?
The logarithmic cost of get/put of TreeMap is an issue. Why do we need natural ordering, is insertion order not enough (LinkedHashMap, LinkedHashSet)?

The other problems might not be too severe.

But if we would say we go ahead and change all the data structures to ordered versions, perhaps we should consider using a specialized collections library in all of Ultimate. Especially considering memory efficiency, many libraries outperform the Java implementations by a factor of 3 or more.

Heizmann commented 7 years ago

Answer to question 1: It will be then sometimes required to use a Map that implement the SortedMap interface (e.g., TreeMaps instead of HashMaps) and this requirement will be propagate itself transitively throughout all datastructures that we use (e.g., NestedMaps)

Answer to question 2: No, is not enough. See the motivation paragraph above: two different algorithms might not use the same insertion order.

danieldietsch commented 7 years ago

Ok, now I understand, its about comparing different algorithms.

I nevertheless think the performance overhead will be too drastic. One way of getting both (being able to compare algorithms vs. having a fast tool) would be to have some data-structure factory that is parameterized with the current "mode".

Obviously we do not want to drag such a factory around as parameter, so it has to be static -- but parametrizable, perhaps with some command line switch.

guluchen commented 6 years ago

After I read it again, I started to afraid that this might not completely solve the non-deterministic issue.

Suppose that we want to compare two Buchi complementation algorithms, both of them produce automata of the same language, but may with different structure, e.g.:

Based on Algorithm 1 we get the following difference automaton: s0-->a-->s1, s0-->a-->s2, s1--->c--->s3, s2--->b--->s4 Based on Algorithm 2 we get the following difference automaton: s0-->a-->s1, s1--->c--->s3, s1--->b--->s4

If we use lexicographical order for LETTER and STATE and performs a DFS, then for the automaton produced by Algorithm 1, "ac" might be encountered before "ab" (because s1 has higher priority than s2). However, for the one produced by Algorithm 2, "ab" is enchanted before "ac" (because "b" has higher priority than "c").

Is my understanding correct?

schillic commented 6 years ago

both of them produce automata of the same language, but may with different structure

This is a different assumption from @Heizmann's above.

For structurally different outputs you are right. In the DFS example we currently collect states in a queue, where a state's key (in the queue) is the state itself, and would run into the problem that you described.

My idea would be to store pairs (w, q) of a state q and the smallest (in terms of lexicographical ordering) word w to reach that state. A pair's key (in the queue) is the lexicographical ordering of the pair, where the word w comes first.

To my impression you then have the guarantee that you visit (lexicographically) smaller words first.

Note that for nondeterministic automata you can reach different states with the same word, and then you would choose the (lexicographically) smaller state. But at the moment I do not see a problem with that.

schillic commented 6 years ago

Note that the above only fixes the emptiness check. For other algorithms you would need to implement a similar fix.

To apply the fix on the level of the automaton itself seems more complicated and very expensive because you de facto have to apply an exhaustive emptiness check to visit all states.

guluchen commented 6 years ago

I think one issue is that it is difficult to give a meaningful order to the states in the automata. The naming can be changed in different automata algorithm so also the search order.

BTW, @schillic for queue I assume you mean priority queue?

schillic commented 6 years ago

for queue I assume you mean priority queue?

Yes.

Heizmann commented 6 years ago

@guluchen I agree that what I propose does not fix the problem for structurally different automata. However I presume that even for structurally different automata it mitigates the problem in many cases.

guluchen commented 6 years ago

@Heizmann I think it might be a problem when we compare Lazy NCSB with the original one, because the former usually generates the structure of the Algorithm 2 style (the guessing is delayed) in the previous post, and the latter usually generates the structure of the Algorithm 1 style.

Heizmann commented 6 years ago

@guluchen Then my suggestion is indeed not helpful for this application (comparison of termination analysis with Lazy NCSB and NCSB).

ultimate-pa / ultimate

Improve inter-algorithm reproducibility in automata library #217