Open inpefess opened 2 months ago
I can work on it if there are no objections against the very idea.
I'm not against the idea, just wanted to note that Vampire's Set
is closer to std::unordered_set
than std::set
to my knowledge, using hashing rather than red-black trees/heaps/whatnot. Also note that we use open addressing, and STL could be using something more sophisticated (and more efficient). But in general, especially in critical code such as in TermSharing
, we should be careful with the change.
I am very excited to see this kind of change proposed! Thanks for filing this. If you are sufficiently brave to tackle this please go ahead and try it once there are no more comments here, otherwise I will do it at some point.
It may be that either
std::unordered_set
to implement Lib::Set
because of the interface and how it is used.The former can probably be solved by changing how we use sets on a case-by-case basis. The latter I could forgive if the penalty is minimal, as in my experience the straight-line speed rarely translates to many solved problems.
Another thorny aspect may be the Hash
parameter, which is used somewhat inconsistently. This can and should be cleaned up anyway.
When I last tested std::unordered_map
, it was quite a bit slower than Vampire's DHMap
, at least on my machine. I would assume the implementations for sets are similar. (It seems to be a common criticism of std::unordered_map
that it is relatively slow because of some API requirements of the C++ standard. But I don't remember the details.)
Great! Thank you for your comments. I will give it a try.
A solution if std::unordered_map
turns out to be too slow is to rewrite the container as simply as possible. We only need one set/map structure, not two, and I personally doubt double-hashing (the "DH" in DHMap
/DHSet
) is worth the extra complexity.
I see at least one non-standard method, rawFindOrInsert
https://github.com/vprover/vampire/blob/1916f50ed5c5f77761a5b6aa4e127c19899fdbe3/Lib/Set.hpp#L170, giving a user a surprising amount of liberty (to use different hash functions for different items stored, for example). Happily, it appears only in Kernel/Term.cpp
(e.g. https://github.com/vprover/vampire/blob/1916f50ed5c5f77761a5b6aa4e127c19899fdbe3/Kernel/Term.cpp#L850), and I don't see any per-item hash functions there. I assume this issue to be blocked by that. I will propose a possible refactoring in a separate issue, e.g. defining a pertinent hash function at a moment of terms set creation instead of passing hash functions when inserting.
Fine by me - go straight to PR if you like, rawFindOrInsert
is not ideal and I would be gladly rid of it.
This is definitely a nice exercise!
However, please try to approach this locally first (if at all possible - maybe focusing on an expected hot spot), so that you don't spend too much time on it, before we can compare the performance.
I wouldn't be too happy to accept a large refactor which on top of things slows Vampire down a bit, just to be closer to using standard classes.
Why?
And why have original basic container implementations instead of standard ones?
Why not replace Vampire's set with
std::unordered_set
?Many more changes at once. This issue proposes to begin with only one file (
Lib/Set.hpp
).We have Vampire's set rewritten as a wrapper around
std::unordered_set
, so what?We can replace the wrapper's methods calls with standard ones of the wrapped set case by case wherever they appear in the code (some parts might need more effort to refactor than others). Ideally, this process ends with a total replacement of the original implementation by the standard.
We can also encourage new code to use standard sets.
If such an approach works, we can do the same with other containers.
Why start with sets?
We already have unit tests for this class.