usi-verification-and-security / opensmt

The opensmt solver
Other
78 stars 18 forks source link

ITEs: Avoid allocating vector with size equal to number of all terms #698

Closed blishko closed 7 months ago

blishko commented 7 months ago

While in a single-query-solving, allocating a vector with size equal to the number of all known terms is not a problem, in a setting where we are solving many small queries and creating huge number of terms, this was observed to actually impact performance.

Specifically, the backtranslation of proof in Golem invokes OpenSMT many times when it is recomputing missing steps (when predicates are eliminated during preprocessing). In one case, this change eliminated approximately 10 seconds of proof production overhead to almost zero.

Tomaqa commented 7 months ago

I would like to ask a few questions first.

I do not know Map structure from Minisat and it seems that it requires some implementation knowledge to understand the API. Since we use the structure in our project, I suppose that it should be better documented.

Anyway, I understood that it behaves as a hash table. Is there a comparison and a reason why std::unordered_map is not used? It also contains reserve which may be useful if there is some good average estimate on the final size.

Generally, I consider using a hash table a good option in this case.

blishko commented 7 months ago

I do not know Map structure from Minisat and it seems that it requires some implementation knowledge to understand the API. Since we use the structure in our project, I suppose that it should be better documented.

The API should be reasonably intuitive, but I agree that some features should be documented. For example that operator[] requires that the key is already in the Map.

Anyway, I understood that it behaves as a hash table. Is there a comparison and a reason why std::unordered_map is not used? It also contains reserve which may be useful if there is some good average estimate on the final size.

There is only some anecdotal evidence. The data structure Map, similar as Vec, were built for MiniSAT, because they had evidence that it was faster that the standard library. But that was 20 years ago. Maybe @aehyvari can say more on this. At some point, I was experimenting with Map vs std::unordered_set/map in the linear arithmetic solver and indeed using Map from MiniSAT was faster. One difference is that, AFAIK, std::unordered_set uses linked list for the buckets, so each entry means a dynamic allocation of the a node there. Map uses Vec for the buckets.

aehyvari commented 7 months ago

Maybe @aehyvari can say more on this.

About the history, that's what I know as well. I remember also me or somebody benchmarking this and the conclusion back then was to stick to the minisat implementation. So not much to add :)

blishko commented 7 months ago

I also did a quick test on a benchmark in QF_LIA with the largest number of assertions, QF_LIA/2019-cmodelsdiff/labyrinth/0125-labyrinth-23-0.smt2. It contains 1 836 257 assertions and the time spent before check-sat command is executed dropped from around 66 seconds to under 5.5 seconds on my laptop with the change in the PR.

At the moment, we check for ITEs in every inserted formula (every assert command). And due to the large number of terms created in this benchmark, OpenSMT spent most of the time allocating, memsetting and deallocating this huge vector on every assert command.

blishko commented 7 months ago

@Tomaqa, I have updated the code according to your suggestions and our discussion. Let me know if this is OK.

blishko commented 7 months ago

LGTM. I am still a bit concerned that we quite widely use Map with not a very intuitive (and sometimes also potentially inefficient) interface. I think that every new person not familiar with Minisat internals could be confused. But we have more important things to do ... I guess we should add this as an issue? Even if we stick to Map (in case it is indeed still faster than STL maps), I think it will be beneficial if the interface is changed.

True, but we do not get that many new people and there are indeed much more important things to work on.