vprover / vampire

The Vampire Theorem Prover
https://vprover.github.io/
Other
277 stars 49 forks source link

Reimplement Lib/Set.hpp as a wrapper of std::unordered_set #551

Open inpefess opened 2 months ago

inpefess commented 2 months ago

Why?

And why have original basic container implementations instead of standard ones?

Why not replace Vampire's set with std::unordered_set?

Many more changes at once. This issue proposes to begin with only one file (Lib/Set.hpp).

We have Vampire's set rewritten as a wrapper around std::unordered_set, so what?

We can replace the wrapper's methods calls with standard ones of the wrapped set case by case wherever they appear in the code (some parts might need more effort to refactor than others). Ideally, this process ends with a total replacement of the original implementation by the standard.

We can also encourage new code to use standard sets.

If such an approach works, we can do the same with other containers.

Why start with sets?

We already have unit tests for this class.

inpefess commented 2 months ago

I can work on it if there are no objections against the very idea.

mezpusz commented 2 months ago

I'm not against the idea, just wanted to note that Vampire's Set is closer to std::unordered_set than std::set to my knowledge, using hashing rather than red-black trees/heaps/whatnot. Also note that we use open addressing, and STL could be using something more sophisticated (and more efficient). But in general, especially in critical code such as in TermSharing, we should be careful with the change.

MichaelRawson commented 2 months ago

I am very excited to see this kind of change proposed! Thanks for filing this. If you are sufficiently brave to tackle this please go ahead and try it once there are no more comments here, otherwise I will do it at some point.

It may be that either

  1. It is not possible to use std::unordered_set to implement Lib::Set because of the interface and how it is used.
  2. We lose performance this way.

The former can probably be solved by changing how we use sets on a case-by-case basis. The latter I could forgive if the penalty is minimal, as in my experience the straight-line speed rarely translates to many solved problems.

Another thorny aspect may be the Hash parameter, which is used somewhat inconsistently. This can and should be cleaned up anyway.

JakobR commented 2 months ago

When I last tested std::unordered_map, it was quite a bit slower than Vampire's DHMap, at least on my machine. I would assume the implementations for sets are similar. (It seems to be a common criticism of std::unordered_map that it is relatively slow because of some API requirements of the C++ standard. But I don't remember the details.)

inpefess commented 2 months ago

Great! Thank you for your comments. I will give it a try.

MichaelRawson commented 2 months ago

A solution if std::unordered_map turns out to be too slow is to rewrite the container as simply as possible. We only need one set/map structure, not two, and I personally doubt double-hashing (the "DH" in DHMap/DHSet) is worth the extra complexity.

inpefess commented 2 months ago

I see at least one non-standard method, rawFindOrInsert https://github.com/vprover/vampire/blob/1916f50ed5c5f77761a5b6aa4e127c19899fdbe3/Lib/Set.hpp#L170, giving a user a surprising amount of liberty (to use different hash functions for different items stored, for example). Happily, it appears only in Kernel/Term.cpp (e.g. https://github.com/vprover/vampire/blob/1916f50ed5c5f77761a5b6aa4e127c19899fdbe3/Kernel/Term.cpp#L850), and I don't see any per-item hash functions there. I assume this issue to be blocked by that. I will propose a possible refactoring in a separate issue, e.g. defining a pertinent hash function at a moment of terms set creation instead of passing hash functions when inserting.

MichaelRawson commented 2 months ago

Fine by me - go straight to PR if you like, rawFindOrInsert is not ideal and I would be gladly rid of it.

quickbeam123 commented 2 months ago

This is definitely a nice exercise!

However, please try to approach this locally first (if at all possible - maybe focusing on an expected hot spot), so that you don't spend too much time on it, before we can compare the performance.

I wouldn't be too happy to accept a large refactor which on top of things slows Vampire down a bit, just to be closer to using standard classes.