Open Heizmann opened 7 years ago
TreeSets can be fast when the elements are sparse and one wants to compare subset relation between them.
Questions:
TreeMap
is an issue. Why do we need natural ordering, is insertion order not enough (LinkedHashMap
, LinkedHashSet
)? The other problems might not be too severe.
But if we would say we go ahead and change all the data structures to ordered versions, perhaps we should consider using a specialized collections library in all of Ultimate. Especially considering memory efficiency, many libraries outperform the Java implementations by a factor of 3 or more.
Answer to question 1: It will be then sometimes required to use a Map that implement the SortedMap interface (e.g., TreeMaps instead of HashMaps) and this requirement will be propagate itself transitively throughout all datastructures that we use (e.g., NestedMaps)
Answer to question 2: No, is not enough. See the motivation paragraph above: two different algorithms might not use the same insertion order.
Ok, now I understand, its about comparing different algorithms.
I nevertheless think the performance overhead will be too drastic. One way of getting both (being able to compare algorithms vs. having a fast tool) would be to have some data-structure factory that is parameterized with the current "mode".
Obviously we do not want to drag such a factory around as parameter, so it has to be static -- but parametrizable, perhaps with some command line switch.
After I read it again, I started to afraid that this might not completely solve the non-deterministic issue.
Suppose that we want to compare two Buchi complementation algorithms, both of them produce automata of the same language, but may with different structure, e.g.:
Based on Algorithm 1 we get the following difference automaton: s0-->a-->s1, s0-->a-->s2, s1--->c--->s3, s2--->b--->s4 Based on Algorithm 2 we get the following difference automaton: s0-->a-->s1, s1--->c--->s3, s1--->b--->s4
If we use lexicographical order for LETTER and STATE and performs a DFS, then for the automaton produced by Algorithm 1, "ac" might be encountered before "ab" (because s1 has higher priority than s2). However, for the one produced by Algorithm 2, "ab" is enchanted before "ac" (because "b" has higher priority than "c").
Is my understanding correct?
both of them produce automata of the same language, but may with different structure
This is a different assumption from @Heizmann's above.
For structurally different outputs you are right. In the DFS example we currently collect states in a queue, where a state's key (in the queue) is the state itself, and would run into the problem that you described.
My idea would be to store pairs (w, q) of a state q and the smallest (in terms of lexicographical ordering) word w to reach that state. A pair's key (in the queue) is the lexicographical ordering of the pair, where the word w comes first.
To my impression you then have the guarantee that you visit (lexicographically) smaller words first.
Note that for nondeterministic automata you can reach different states with the same word, and then you would choose the (lexicographically) smaller state. But at the moment I do not see a problem with that.
Note that the above only fixes the emptiness check. For other algorithms you would need to implement a similar fix.
To apply the fix on the level of the automaton itself seems more complicated and very expensive because you de facto have to apply an exhaustive emptiness check to visit all states.
I think one issue is that it is difficult to give a meaningful order to the states in the automata. The naming can be changed in different automata algorithm so also the search order.
BTW, @schillic for queue I assume you mean priority queue?
for queue I assume you mean priority queue?
Yes.
@guluchen I agree that what I propose does not fix the problem for structurally different automata. However I presume that even for structurally different automata it mitigates the problem in many cases.
@Heizmann I think it might be a problem when we compare Lazy NCSB with the original one, because the former usually generates the structure of the Algorithm 2 style (the guessing is delayed) in the previous post, and the latter usually generates the structure of the Algorithm 1 style.
@guluchen Then my suggestion is indeed not helpful for this application (comparison of termination analysis with Lazy NCSB and NCSB).
Motivation of the problem Let's assume we have two algorithms that both produce automaton as a result. The automata are conceptually identical but their data structure store elements in different orders. The first algorithm added the transition (q_45, a, q _23) and then the transition (q_45, a, q _42), the second algorithm added both transitions in the different order. Now, subsequent algorithms may behave completely on the first automaton than on the second automaton (e.g., the order in which the state space is explored might be completely different).
Some solution Let the type parameter LETTER and STATE implement the Comparable interface. Use TreeMaps and TreeSets (instead of HashMaps and HashSets) whenever we iterate over Maps/Sets.
Problems
I do not want to implement that soon. I cannot overlook all consequences. I just wanted to start a discussion to get more feedback.