Evaluation new version and comparison with clingcon 3.

MaxOstrowski commented 4 years ago

Times of clingcon-4 and pyclingcon are off by around 2 secs for activating/deactivating conda env, this is annoying.
Otherwise, clingcon-3.3.0 seems to have better defaults for what to translate? Options without translation seem compareable.

rkaminsk commented 4 years ago

The benchmark is completely dominated by the flow_shop class.

Also, maybe there is something wrong with the estimate function for the number of clauses. If I increase the limit, then the new clingcon also translates. It still is not able to solve the same instances clingcon-3 does (adding just one more thread with -t2 changes the situation).

EDIT: I checked the estimate function is working for the benchmark. Clingcon-3's estimate is set to 10000 and clingcon-4's to 1000 so this explains the difference.

What might be another difference is that the new version does not add binary clauses to do order propagation initially. Does clingcon-3 do this? It would be very easy to implement. There must be some reason because clingcon-3 solves the instances with very few conflicts. Adding clauses to the binary implication graph can drastically change propagation because such clauses are always propagated first.

MaxOstrowski commented 4 years ago

It has an option to do so --explicit-binary-order=<arg> : Create binary order nogoods if possible (default: false) but is is disabled by default.

It does create 1000 Literals per Variable by default. This means that no literals have to be created on the fly during propagation though. For most of the domains the translation should add all literals, no ?
lazy reasons are disabled --learn-nogoods=<arg> : Learn nogoods while propagating (default: true)
It does propagation incrementally, so propagate until some linear constraints derive a clause, add these clauses, redo unit propagation. During unit propagation watches are triggered directly (no accumulation of literals that changed...) and order literal propagation is also done directly. Maybe this changes something but should not really explain much difference in choices/conflicts.

Still I have no idea what the difference could be. We would need to test single instances with full translation maybe ...

rkaminsk commented 4 years ago

It has an option to do so --explicit-binary-order=<arg> : Create binary order nogoods if possible (default: false) but is is disabled by default.

* It does create 1000 Literals per Variable by default. This means that no literals have to be created on the fly during propagation though. For most of the domains the translation should add all literals, no ?

* lazy reasons are disabled
  ` --learn-nogoods=<arg>           : Learn nogoods while propagating (default: true)`

* It does propagation incrementally, so propagate until some linear constraints derive a clause, add these clauses, redo unit propagation.
  During unit propagation watches are triggered directly (no accumulation of literals that changed...) and order literal propagation is also done directly. Maybe this changes something but should not really explain much difference in choices/conflicts.

Still I have no idea what the difference could be. We would need to test single instances with full translation maybe ...

I cannot have anything to do with this. As soon as there are no constraints anymore, no reasons are generated and in the end all variables are fully assigned.

MaxOstrowski commented 4 years ago

I cannot have anything to do with this. As soon as there are no constraints anymore, no reasons are generated and in the end all variables are fully assigned.

But the translation is slightly different, right?

MaxOstrowski commented 4 years ago

Please wait.

Too late :) results.zip

Having translation=10000 help but is still off a small bit. Also it solves other instances. But remember that we found out that also my estimation is a bit different, right?

rkaminsk commented 4 years ago

Please wait.

Too late :) results.zip

Having translation=10000 help but is still off a small bit. Also it solves other instances. But remember that we found out that also my estimation is a bit different, right?

The instances which the old clingcon can solve require 3000 clauses and it translates them. Why there is still such a big gap I don't know. You could try to implement the old translation in the new clingcon. Maybe you did it differently.

EDIT: Since we only have coefficients of 1 and -1, your estimate should be exact.

MaxOstrowski commented 4 years ago

Not that I know :) Did a pretty crappy algorithm.

Could also be the sorting...?

rkaminsk commented 4 years ago

Not that I know :) Did a pretty crappy algorithm.

I do not know what exactly it does. Maybe it is different. Why not just try to reimplement your old version. The data structures in the new clingcon are quite simple:

the SumConstraint has the bound and a vector of coefficient/variable pairs
via Solver.var_state(variable) you can get a VarState with the bounds of a variable
via Solver.get_literal(cc, vs, value) you can add a new order literal to a VarState
with AbstractClauseCreator.add_clause(...) you can add clauses

Could also be the sorting...?

We are talking about difference constraints here right? Sorting by coefficients should not matter. It should only affect the order of clauses but not the number of generated clauses.

rkaminsk commented 4 years ago

Hi @MaxOstrowski, can you also test configurations using --order-heuristic=max-chain. I am thinking of the two combinations

--order-heuristic=max-chain --translate-clauses=10000, and
--order-heuristic=max-chain --translate-clauses=0.

This heuristic has the potential to drastically reduce the choices when solving. The idea is that whenever the solver wants to make an order literal true or false, then the order literal of the lower or upper bound of the same variable is made true or false instead, respectively.

Let's say we have variable x with order literals 1<=x1, 2<=x2, 3<=x3, 4<=x4, 5<=x5, a lower bound of 2 and an upper bound of 5. In this case 1<=x1 is false and 5<=x5 is true. If the solver's decision heuristic now decides to make x<=4 true, then the above heuristic will instead make literal 2<=x2 true, which will of course also make x<=4 true by propagation. This heuristic seems to reduce the ratio between conflicts and choices and helps on some instances I have tested both when translation is enabled and disabled.

MaxOstrowski commented 4 years ago

Rerunning benchmarks with newest version and new options. I also had a look at the heuristic and it has great effect in some cases. Still it does not explain the differences in flowshop. I have not yet managed to reimplement the old translation method, but I can't explain myself the big differences. I also noticed that order literal propagation and initial propagation seems to be much slower in the new clingcon.

clingcon tai50_5_2.lp --stats=2 --translate-clauses=10000 --translate-pb=0 --order-heuristic=max-chain --solve-limit=0

Models       : 0+
Calls        : 1
Time         : 24.062s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time     : 24.060s

Choices      : 0       
Conflicts    : 0        (Analyzed: 0)
Restarts     : 0       
Problems     : 0        (Average Length: 0.00 Splits: 0)
Lemmas       : 0        (Deleted: 0)
  Binary     : 0        (Ratio:   0.00%)
  Ternary    : 0        (Ratio:   0.00%)
  Conflict   : 0        (Average Length:    0.0 Ratio:   0.00%) 
  Loop       : 0        (Average Length:    0.0 Ratio:   0.00%) 
  Other      : 0        (Average Length:    0.0 Ratio:   0.00%) 
Backjumps    : 0        (Average:  0.00 Max:   0 Sum:      0)
  Executed   : 0        (Average:  0.00 Max:   0 Sum:      0 Ratio:   0.00%)
  Bounded    : 0        (Average:  0.00 Max:   0 Sum:      0 Ratio: 100.00%)

Rules        : 618856   (Original: 613956)
  Choice     : 50      
Atoms        : 28306   
Bodies       : 135001   (Original: 132601)
  Count      : 100      (Original: 200)
Equivalences : 39550    (Atom=Atom: 12350 Body=Body: 0 Other: 27200)
Tight        : Yes
Variables    : 919655   (Eliminated:    0 Frozen: 14850)
Constraints  : 39231682 (Binary:   3.8% Ternary:  96.2% Other:   0.0%)

Clingcon
  Init time in seconds
    Total    : 22.585
    Simplify : 0.00455508
    Translate: 12.8813
  Problem
    Constraints: 0
    Variables: 250
    Clauses  : 3.82603e+07
    Literals : 784755
  Translate
    Constraints removed: 12450
    Constraints added: 0
    Clauses  : 3.82588e+07
    Weight constraints: 0
    Literals : 784755
  [Thread 0]
    Time in seconds
      Total  : 1.858e-06
      Propagation: 0
      Check  : 1.858e-06
      Undo   : 0
    Refined reason: 0
    Introduced reason: 0
    Literals introduced: 0

While clingcon-3.3.0

clingcon version 3.3.0
Reading from ...marks/instances/flow_shop/tai50_5_2.lp ...
Solving...
UNKNOWN

Models       : 0+
Calls        : 1
Time         : 17.224s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time     : 17.216s

Choices      : 0       
Conflicts    : 0        (Analyzed: 0)
Restarts     : 0       
Problems     : 0        (Average Length: 0.00 Splits: 0)
Lemmas       : 0        (Deleted: 0)
  Binary     : 0        (Ratio:   0.00%)
  Ternary    : 0        (Ratio:   0.00%)
  Conflict   : 0        (Average Length:    0.0 Ratio:   0.00%) 
  Loop       : 0        (Average Length:    0.0 Ratio:   0.00%) 
  Other      : 0        (Average Length:    0.0 Ratio:   0.00%) 
Backjumps    : 0        (Average:  0.00 Max:   0 Sum:      0)
  Executed   : 0        (Average:  0.00 Max:   0 Sum:      0 Ratio:   0.00%)
  Bounded    : 0        (Average:  0.00 Max:   0 Sum:      0 Ratio: 100.00%)

Rules        : 643356   (Original: 638456)
  Choice     : 50      
Atoms        : 28306   
Bodies       : 159501   (Original: 157101)
  Count      : 100      (Original: 200)
Equivalences : 2800     (Atom=Atom: 100 Body=Body: 0 Other: 2700)
Tight        : Yes
Variables    : 931905   (Eliminated:    0 Frozen: 799505)
Constraints  : 39256182 (Binary:   3.9% Ternary:  96.1% Other:   0.0%)

rkaminsk commented 4 years ago

I also noticed that order literal propagation and initial propagation seems to be much slower in the new clingcon.

The VarState class uses a std::map for order literals and the map from literals to order literals a std::unordered_multimap. Both could be replaced by vectors.

Replacing the former with an ordered std::vector would result in much slower order literal insertion but faster lookups. For the difference constraints one could even go one step further and map array indices directly to literals making everything constant (even iteration, in the sense of advancing to the next element, because each value in the domain needs a literal there anyway). Given that propagation, which is only about lookups, is the dominating factor, it might really be worth to replace it with a std::vector.

The std::unordered_multimap could be replaced by a std::vector or std::unordered_map if we make sure that each order literal has a unique solver literal. This is currently not the case because of the Solver::add_simple method. It could easily be refactored to introduce a new literal, though.

Another issue is that we have to store all the clauses added during translation and commit them later. I am not sure if the old clingcon had to do it like this, too? But we cannot get rid of this anyway. The last thing that comes to my mind is that the clingo-API might also cause more overhead. We should run a profiler to get an idea where most time is spend!

MaxOstrowski commented 4 years ago

I'm not completely sure if I get everything right: For variable value -> order_literal i used a vector after insertion of a certain amount of literals,s.t. the map does not get too big. For Clasp::Var -> orderVariable I used an unordered_map to a vector of Variable/Value Additionally I used the datablob that can be added to each watch to store: a) 1 bit, if it is an orderLiteral or a constraint (adding 2 watches if its both) b) in case of a constraint, the index to a vector

During propagate(..) i directly use clasp.force to change all implied order literals that are not already decided. As reason I use the literal that was watched, so for x<=1 I add the implications

x<=1 -> x<=2
x<=1 -> x<=3
x<=1 -> x<=4
x<=1 -> x<=5

Fixpoint iteration during check is done interleaved with order_literal propagation (As the watch is immediately triggerd in the old interface (unlike the list of changes that we have now). This may be beneficial, as expensive constraint propagation is delayed. I do propagate one constraint from the todo list, creating a temporary list of clauses. These clauses are then added an propagated (including order literal propagation) before a new constraint creates its clauses.

MaxOstrowski commented 4 years ago

we have to store all the clauses added during translation and commit them later Sorry, i misread this part. We create clauses one by one and directly add them to clasp.

MaxOstrowski commented 4 years ago

Fixpoint iteration during check is done interleaved with order_literal propagation (As the watch is immediately triggerd in the old interface (unlike the list of changes that we have now).
This may be beneficial, as expensive constraint propagation is delayed.

Analyzing the current code I'm unsure if we currently do this. The question is if, once in check, we process the constraints in todo wrt. the state of the variables present during beginning of the check call, or if the state gets updated s.t. we process constraints only once (although their bounds may have changed several times due to other constraint propagation in check).

rkaminsk commented 4 years ago

I'm not completely sure if I get everything right: For variable value -> order_literal i used a vector after insertion of a certain amount of literals,s.t. the map does not get too big.

A vector will certainly speed up order literal lookup. I do not know if you use an ordered vector of value/literal pairs or use a vector with value-domain_minimum as index.

For Clasp::Var -> orderVariable I used an unordered_map to a vector of Variable/Value

Right now it is an unordered_multimap. It can be replaced with an unorder_map like you are using. Or even with a vector. In the translation case with a lot of order literals, a vector will certainly be better.

Additionally I used the datablob that can be added to each watch to store: a) 1 bit, if it is an orderLiteral or a constraint (adding 2 watches if its both) b) in case of a constraint, the index to a vector

This is not possible with the Clingo-API and also does not matter during translation. There is no way around lookups (but we can at least use fast vector lookups).

During propagate(..) i directly use clasp.force to change all implied order literals that are not already decided. As reason I use the literal that was watched, so for x<=1 I add the implications
x<=1 -> x<=2
x<=1 -> x<=3
x<=1 -> x<=4
x<=1 -> x<=5

This is available as an option. The default is to add the clauses x<=1 -> x<=2, x<=2 -> x<=3, x<=3 -> x<=4, x<=4 -> x<=5, to avoid saturating the binary implication graph. This does not matter during translation.

Fixpoint iteration during check is done interleaved with order_literal propagation (As the watch is immediately triggerd in the old interface (unlike the list of changes that we have now). This may be beneficial, as expensive constraint propagation is delayed. I do propagate one constraint from the todo list, creating a temporary list of clauses. These clauses are then added an propagated (including order literal propagation) before a new constraint creates its clauses.

Again, this does not matter during translation and also has no effect when constraints are translated. Otherwise, it is easily possible to implement this too. I could add an option for it.

MaxOstrowski commented 4 years ago

A vector will certainly speed up order literal lookup. I do not know if you use an ordered vector of value/literal pairs or use a vector with value-domain_minimum as index.

Something like the value-domain_minimum as index. (I had to handle domains with holes, but basically it boils down to this.

Right now it is an unordered_multimap. It can be replaced with an unorder_map like you are using.

Should not make much of a difference imho.

Or even with a vector. In the translation case with a lot of order literals, a vector will certainly be better.

Sure, but this would then be something that old clingcon did not have (just to mention).

This is not possible with the Clingo-API and also does not matter during translation. There is no way around lookups (but we can at least use fast vector lookups).

I know, just wanted to communicate every difference.

Again, this does not matter during translation and also has no effect when constraints are translated.

Sorry, I misread that it is about translation, and already answered above that we do not have to store all clauses before commiting. Could you give me a leg up and tell me why again we had to store them all before commiting?

rkaminsk commented 4 years ago

Right now it is an unordered_multimap. It can be replaced with an unorder_map like you are using.

Should not make much of a difference imho.

Even though we are just talking about (amortized) constant factors here. I think it can make quite the difference. Especially when switching to a vector.

Again, this does not matter during translation and also has no effect when constraints are translated.

Sorry, I misread that it is about translation, and already answered above that we do not have to store all clauses before committing. Could you give me a leg up and tell me why again we had to store them all before committing?

Because interleaving adding clauses and literals is quadratic with clasp. Adding the literals and then the clauses is linear.

MaxOstrowski commented 4 years ago

Because interleaving adding clauses and literals is quadratic with clasp. Adding the literals and then the clauses is linear.

clingcon-3.3.0 computes an estimate of variables that are needed for the translation and creates all these literals at once, faking createNewLiteral() afterwards.

rkaminsk commented 4 years ago

clingcon-3.3.0 computes an estimate of variables that are needed for the translation and creates all these literals at once, faking createNewLiteral() afterwards.

We could also follow such a strategy but I would not implement it like this. The time overhead of storing the clauses should be small. It increases the peak memory of course, which might be trouble for large programs. It can be reduced by running ClauseCreator::commit sporadically. I profiled the initialization process. The bulk of time (80%) is spend in Solver::get_literal and ClauseCreator::commit. We cannot speed up ClauseCreator::commit because everything there happens within clasp. It might be possible to make Solver::get_literal faster though by using a std::vector instead of a std::map.

See the attached profile.pdf. For the meaning of the values check https://gperftools.github.io/gperftools/cpuprofile.html.

profile

MaxOstrowski commented 4 years ago

Definitely nice to see where the time is spent (and kind of unexpected, that adding clauses takes that much time). But it looks like that vector could help a lot and speed up the lookup by nearly 40%.

MaxOstrowski commented 4 years ago

results.zip

haven't looked at it in detail, but looks much nicer...

rkaminsk commented 4 years ago

results.zip

haven't looked at it in detail, but looks much nicer...

The times look very similar. The only class where it is really different is the flow_shop class. Where I still do not understand the difference.

I also profiled solving (without translation) on a larger instance. The good news is that the solver spends over 70% of its time doing unit propagation. This is how it should be. Any data structure optimization can only increase speed by a factor of .3.

We might still be able to find critical instances where the situation is not like this.

MaxOstrowski commented 4 years ago

results.zip

Big improvements on propagation heavy instances (like doubling the speed). Thanks a lot.

rkaminsk commented 4 years ago

I am done with this issue. Do you still want to investigate the translation?

MaxOstrowski commented 4 years ago

Unlikely that I find the time anytime soon. Closing for now?

rkaminsk commented 4 years ago

Unlikely that I find the time anytime soon. Closing for now?

As you like. If you want to look at it again. :wink: We could also keep it open to not forget about it. Your call.

MaxOstrowski commented 4 years ago

Some evaluation on the minizinc competition 2019 benchmarks (testing some configurations) and an (unfair) comparison to chuffed (chuffed uses more global constraints) can be found here: https://github.com/potassco/flatzingo/issues/11

In short: Our Base speed seems to be quite good and probably better than chuffed, but we are lacking a lot of constraints to be comparable (some of them seem to make sense, some just stupid).

mbalduccini commented 1 month ago

Do you have any more recent performance comparisons with CP solvers, especially using clingcon 5?

MaxOstrowski commented 1 month ago

Unfortunately not really. You could look at the minizinc competition results in 2021, but take it with a grain of salt. Quite some performance is left on the table due to double grounding (minizinc flattening and regrounding using clingcon).

But the techniques used should still be similar to state of the art solvers.

potassco / clingcon

Evaluation new version and comparison with clingcon 3. #25