trealla-prolog / trealla

A compact, efficient Prolog interpreter written in plain-old C.
MIT License
268 stars 13 forks source link

non-linear slowdown in A* search #544

Closed Jean-Luc-Picard-2021 closed 1 month ago

Jean-Luc-Picard-2021 commented 4 months ago

Now this here was a surprise. I am testing some A* search, here is what SWI-Prolog gives:

?- start(X), time(astar(X, [], [], [], L)), length(L, N).
% 13,959,315 inferences, 0.938 CPU in 0.932 seconds (101% CPU, 14889936 Lips)
X = [6, 1, 3, 4, -1, 5, 7, 2, 0],
L = [left, left, up, right, right, down, left, left, up|...],
N = 26 .

?- X = [-1, 7, 6, 5, 4, 3, 2, 1, 0], time(astar(X, [], [], [], L)), length(L, N).
% 231,210,078 inferences, 15.844 CPU in 15.843 seconds (100% CPU, 14593141 Lips)
X = [-1, 7, 6, 5, 4, 3, 2, 1, 0],
L = [up, left, left, down, right, up, right, down, left|...],
N = 30 .

If I measure Trealla Prolog I see a non-linear slow down. The first example is ca. 5-times slower. But the second example is ca. 10-times slower.

  Trealla  
slago 5221 560%
own 157267 993%

Any idea how this happens? Some built-in like length/2 or member/2 or append/3 that gets dominant? Or garbage collection which chokes on large lists?

Source code:

puzzle.p.log

infradig commented 4 months ago

From the multi-gigabytes of memory being consumed i'd say that's a likely culprit.

infradig commented 4 months ago

Btw, you can use memberchk/2 here instead of member/2 for a speed boost.

Jean-Luc-Picard-2021 commented 4 months ago

It runs fine with1 GB in SWI-Prolog:

?- current_prolog_flag(stack_limit, X).
X = 1073741824.

The Prolog code just creates a new state, but the old state can practically immediately be reclaimed. But a garbage collector might delay the reclamation a little bit.

I am currently testing variants of puzzle.p that use keysort/2 and/or library(heaps). I am testing across Prolog systems. Have the feeling the non-linearity goes away with

generational garbage collection.

infradig commented 2 months ago

It seems insert/4 is the bottleneck here:

% insert(+Agenda, +Key, +Value, -Agenda)
insert([], K, V, [K-V]).
insert([J-W|L], K, V, [K-V,J-W|L]) :- K @=< J, !.
insert([P|L], K, V, [P|R]) :-
   insert(L, K, V, R).

as it's not TCO without some smarts, which obviously SWI has.

Jean-Luc-Picard-2021 commented 2 months ago

Some say its TCO (Tail Call Optimization), usually identified as a recursive call into the same predicate again, other call it LCO (Last Call Optimization), need not be the same predicate.

I think SWI-Prolog has specialized on TCO. But some Prolog systems simply do something for LCO. LCO is basically the idea of the "depart" instruction in this paper:

image

https://www.softwarepreservation.org/projects/prolog/lisbon/lpw83/p74-Bowen.pdf

I wasn't able to do LCO properly in formerly Jekejeke Prolog, since I had the idea of maintaining a call stack. In Dogelog Player there is no more call stack, so LCO is better, slightly behind TCO:

  SWI 9.3.8 Doge 1.2.1 Scryer 0.9.4 TPL 2.53.50
slago 934 2686 3383 4277
own 15550 50444 66264 123535
         
  SWI 9.3.8 Doge 1.2.1 Scryer 0.9.4 TPL 2.53.50
slago 100% 288% 362% 458%
own 100% 324% 426% 794%
difference   37% 64% 337%

I was redoing the benchmark to keep up with various changes. As can be seen, Scryer Prolog has only a minor non-linearity problem, compared to SWI-Prolog. Since the "depart" trick

already existed in WAM, and Scryer Prolog is WAM, I guess they get it from there? Not sure.