Closed blishko closed 7 months ago
I would like to ask a few questions first.
I do not know Map
structure from Minisat and it seems that it requires some implementation knowledge to understand the API. Since we use the structure in our project, I suppose that it should be better documented.
Anyway, I understood that it behaves as a hash table. Is there a comparison and a reason why std::unordered_map
is not used?
It also contains reserve
which may be useful if there is some good average estimate on the final size.
Generally, I consider using a hash table a good option in this case.
I do not know
Map
structure from Minisat and it seems that it requires some implementation knowledge to understand the API. Since we use the structure in our project, I suppose that it should be better documented.
The API should be reasonably intuitive, but I agree that some features should be documented. For example that operator[]
requires that the key is already in the Map
.
Anyway, I understood that it behaves as a hash table. Is there a comparison and a reason why
std::unordered_map
is not used? It also containsreserve
which may be useful if there is some good average estimate on the final size.
There is only some anecdotal evidence. The data structure Map
, similar as Vec
, were built for MiniSAT, because they had evidence that it was faster that the standard library. But that was 20 years ago. Maybe @aehyvari can say more on this.
At some point, I was experimenting with Map
vs std::unordered_set/map
in the linear arithmetic solver and indeed using Map
from MiniSAT was faster.
One difference is that, AFAIK, std::unordered_set
uses linked list for the buckets, so each entry means a dynamic allocation of the a node there. Map
uses Vec
for the buckets.
Maybe @aehyvari can say more on this.
About the history, that's what I know as well. I remember also me or somebody benchmarking this and the conclusion back then was to stick to the minisat implementation. So not much to add :)
I also did a quick test on a benchmark in QF_LIA with the largest number of assertions, QF_LIA/2019-cmodelsdiff/labyrinth/0125-labyrinth-23-0.smt2
.
It contains 1 836 257
assertions and the time spent before check-sat
command is executed dropped from around 66 seconds to under 5.5 seconds on my laptop with the change in the PR.
At the moment, we check for ITEs in every inserted formula (every assert
command). And due to the large number of terms created in this benchmark, OpenSMT spent most of the time allocating, memsetting and deallocating this huge vector on every assert command.
@Tomaqa, I have updated the code according to your suggestions and our discussion. Let me know if this is OK.
LGTM. I am still a bit concerned that we quite widely use
Map
with not a very intuitive (and sometimes also potentially inefficient) interface. I think that every new person not familiar with Minisat internals could be confused. But we have more important things to do ... I guess we should add this as an issue? Even if we stick toMap
(in case it is indeed still faster than STL maps), I think it will be beneficial if the interface is changed.
True, but we do not get that many new people and there are indeed much more important things to work on.
While in a single-query-solving, allocating a vector with size equal to the number of all known terms is not a problem, in a setting where we are solving many small queries and creating huge number of terms, this was observed to actually impact performance.
Specifically, the backtranslation of proof in Golem invokes OpenSMT many times when it is recomputing missing steps (when predicates are eliminated during preprocessing). In one case, this change eliminated approximately 10 seconds of proof production overhead to almost zero.