addition example is slow to fail

jeshan commented 8 months ago

In this Sicstus example,

pred(A, B, C) :-
    Max = 10000,
    [A, B] ins 0..Max,
    B #= C+A-Max,
    B #= C+1.

The predicate correctly fails since C+1 does not fit B's Max supremum (but C+0 would fit). However, it is very slow to terminate (due to Max being big). What can be done to fasten this seemingly simple kind of code? (In practice, these constraints come from more complex predicates that are hard to relate)

UWN commented 8 months ago

Also SICStus is slow here but still faster by a factor of 10^5. It shows with larger Max. If you can, subtract the two equations from each other yielding a redundant new one without B and C. Thus 0 #= A-Max-1.

triska commented 8 months ago

Thank you a lot for this example!

I have made it a bit faster with the change filed in https://github.com/mthom/scryer-prolog/pull/2258, could you please try it out?

jeshan commented 8 months ago

Thanks, it's faster and looks to be on the right track! But can we find some other similar scenarios (e.g subtraction and multiplication) and fix them altogether? A couple of more examples:

% takes 3.2 seconds to fail
example2(A, B, C, D) :-
    Max is 20*1000,
    [A, B, C, D] ins 0..Max,
    D #= C-2,                   % change -2 to -1 to succeed
    D + A #= C,
    A + Max-1 #= B.

I didn't know how to simplify that example further. An additional one that happens for me often in practice:

% takes 3.1 seconds to fail
example3(A, B, C, D, E) :-
    Max is 25000,
    [A, B, C, D, E] ins 0..Max,
    %
    Tmp0 #= 32+C,
    Tmp0 = D,
    %
    Tmp1 #= E+1,
    Tmp1 = C,
    Tmp2 #= E+A,
    Tmp2 = D,
    %
    Tmp3 #= A+Max-31,
    Tmp3 = B.                   % comment this to succeed

I left the unifications as-is as I don't know if unification order is relevant to your attributed variables

I also found that changing the constants affect speed a lot.

Since I don't know which fixes will be distinct, I'll be happy to report more examples after the above are fixed :)

triska commented 8 months ago

But can we find some other similar scenarios (e.g subtraction and multiplication) and fix them altogether?

From just these examples, it is hard to tell what exactly is needed here. For instance, let's take example2/4 which you posted:

example2(A, B, C, D) :-
    Max is 20*1000,
    [A, B, C, D] ins 0..Max,
    D #= C-2,                   % change -2 to -1 to succeed
    D + A #= C,
    A + Max-1 #= B.

We note that D #= C - 2, and we can use this information by syntactically replacing D by C - 2 where it occurs. For instance, we can rewrite the clause to:

example2(A, B, C, D) :-
    Max is 20*1000,
    [A, B, C, D] ins 0..Max,
    D #= C-2,                   % change -2 to -1 to succeed
    C - 2 + A #= C,
    A + Max-1 #= B.

And now we get:

?- time(example2(A,B,C,D)).
   % CPU time: 0.002s, 2_199 inferences
   false.

So, a simple textual substitution has made the example thousands of times faster. Is there any general point we can deduce from this?

One idea that you could try is to slightly change the code so that the posted constraints are available and can be reasoned about as Prolog terms. For instance, consider changing :- to --> to turn the clause into a DCG:

example2(A, B, C, D) -->
    Max is 20*1000,
    [A, B, C, D] ins 0..Max,
    D #= C-2,
    C - 2 + A #= C,
    A + Max-1 #= B.

With the following additional definitions:

X is Expr --> [X is Expr].
A #= B    --> [A #= B].
Xs ins D  --> [Xs ins D].

we get:

?- phrase(example2(A,B,C,D), Gs).
   Gs = [_A is 20*1000,[A,B,C,D]ins 0.._A,D#=C-2,C-2+A#=C,A+_A-1#=B].

You can now analyze these goals and perform such substitutions automatically.

jeshan commented 8 months ago

it is hard to tell what exactly is needed here.

I was thinking that this is a great opportunity to fix some performance issues in this area.

we can use this information by syntactically replacing D by C - 2 where it occurs

Yes, but these are examples that I provided so that we can focus the investigation on simpler scenarios. In practice, these constraints will be posted by different predicates that may not use terms that are easily substituted, making replacement by unification a lot more difficult.

You can now analyze these goals

understood: I will need some time to think about this but shouldn't it be a last resort approach?

triska commented 8 months ago

Thank you a lot, I greatly appreciate your help with this!

The key task would be to figure out what exactly is needed here in the sense that:

It improves the run time of your examples and others like it, and
it does not significantly slow down other examples.

There will always be an inherent trade-off between (1) and (2): Just writing code to detect whether an instance of a CSP has a certain shape already costs computation time, and therefore negatively impacts all examples that are not of this shape.

From an ideal constraint solver, we expect that all semantically equivalent formulations of a problem are solved with the same (optimal) performance. This seems not attainable in practice, and we already have semantically equivalent constraints with different performance in the same situations, such as all_different/1 and all_distinct/1. Applications can choose which of these they want to use, and such application-specific constraint considerations already occur in practice.

For example3/5, I hoped that just varying the order in which the constraints are posted can improve performance, and I wrote:

X is Expr --> [X is Expr].
A #= B    --> [A #= B].
Xs ins D  --> [Xs ins D].
X = Y     --> [X = Y].

goal_priority(Goal, P) :-
        functor(Goal, F, _),
        functor_priority(F, P).

functor_priority(=   , 0).
functor_priority(is  , 0).
functor_priority(ins , 3).
functor_priority(#=  , 2).

This is intended to post the constraints in such a way that propagation is delayed as long as possible, so that all constraints can be taken into account when the domains of all variables become restricted to finite sets:

?- phrase(example3(A,B,C,D,E), Gs0),
   maplist(goal_priority, Gs0, Ps0),
   pairs_keys_values(Pairs0, Ps0, Gs0),
   keysort(Pairs0, Pairs),
   pairs_values(Pairs, Gs),
   time(maplist(call, Gs)).

Unfortunately, it did not help in this case. Maybe a different order yields better results?

UWN commented 8 months ago

(Not 100% sure, just thinking) Another possibility could be to have an extra execution mode where constraints are only posted without any propagation. Then, there is a phase were some adding of other constraints and the like happens. And only then, finally they are actually activated. In this manner one does not have to reformulate the problem, but it adds some kind of state.

jeshan commented 8 months ago

Thank you a lot, I greatly appreciate your help with this!

Glad to help!

There will always be an inherent trade-off

Yes. Observing how quickly sicstus clpfd runs these examples made me hope to achieve the same for clpz! :smile: . Maybe Carlsson's have the insights buried somewhere in his papers!

Maybe a different order yields better results

Maybe by a small percentage. I didn't pay close attention there because I thought that the ("thrashing") problem was where I should focus.

Another possibility could be to have an extra execution mode where constraints are only posted without any propagation

Yes, I'd love to see that as an option! But maybe it's not easy for you to make it happen.

If you don't mind a couple of more examples, these 2 "obvious" ones are quick to fail with clpfd but takes 3.1 seconds with clpz:

example4(A, B) :-
    A in 0..20000,
    B #>= A, % direct contradiction with next goal
    B #< A.

example5(A, B) :-
    A in 0..20000,
    B #>= A,
    B+1 #= A. % here too, B will be less than A

These were found after long hours of investigation in my application :sweat: so it's currently impractical for me to process the goals (but I'll do it if I don't have a choice!)

Question for you: Let's say that I need some "special" considerations for my use cases, i.e I care about the currently slow examples more than the ones that clpz currently optimises for. Can you guide us what change we could make so that my examples run fast?

I see that a fix has already been merged for Scryer. I'll let you decide on which examples fit the scope of this issue. Then either of us can submit a PR here.

UWN commented 8 months ago

For one, there is fd_batch/1 in SICStus. But the limitations there are quite problematic. In fact, since version 4, SICStus has introduced many such anti-nondeterministic 'improvements' which hurt declarative properties a lot.

jeshan commented 8 months ago

I think the pull request https://github.com/mthom/scryer-prolog/pull/2258 accidentally had some regressions, e.g 2*X #= Y-0 used to hold but no longer does. (I'm relying on my manual patch since clpz code in scryer has deviated) Can @triska confirm on a recent scryer build? I see another pull request https://github.com/mthom/scryer-prolog/pull/2277 which led me to believe that there are more cases to consider.

UWN commented 8 months ago


ulrich@gupu:/opt/gupu/scryer-prolog$ target/release/scryer-prolog -v
v0.9.3-180-g58cd0d16
ulrich@gupu:/opt/gupu/scryer-prolog$ target/release/scryer-prolog
?- 2*X #= Y-0.
   clpz:(2*X#=Y), clpz:(2*_A#=Y).
?- 2*X #= Y-0,Y=0.
   X = 0, Y = 0.

jeshan commented 8 months ago

@UWN Is this the build with the current master? there were more merged commits in the meantime so I don't know for certain which commit caused what. Someone needs to confirm with sicstus

UWN commented 8 months ago

This is the current build with master of Scryer.

And SICStus 4.9.0 is fine too:

| ?- use_module(clpz).  
% compiling /opt/gupu/clpz/clpz.pl...
...
% compiled /opt/gupu/clpz/clpz.pl in module clpz, 658 msec 2683760 bytes
yes
| ?-    2*X #= Y-0.
clpz:(2*_A#=Y),
clpz:(2*X#=Y) ? ;
no
| ?- 2*X #= Y-0,Y=0.
X = 0,
Y = 0 ? ;
no

(in general it helps to post separate issues for separate problems. But at least here, I cannot see any)

jeshan commented 8 months ago

My bad: It seems like I made a mistake with my manual patch! :facepalm:

in general it helps to post separate issues

Yes, but I was convinced it was related to the patch in this discussion. The newly linked PR should resolve this issue.

triska / clpz

addition example is slow to fail #26