Closed Jean-Luc-Picard-2021 closed 1 month ago
From the multi-gigabytes of memory being consumed i'd say that's a likely culprit.
Btw, you can use memberchk/2 here instead of member/2 for a speed boost.
It runs fine with1 GB in SWI-Prolog:
?- current_prolog_flag(stack_limit, X).
X = 1073741824.
The Prolog code just creates a new state, but the old state can practically immediately be reclaimed. But a garbage collector might delay the reclamation a little bit.
I am currently testing variants of puzzle.p that use keysort/2 and/or library(heaps). I am testing across Prolog systems. Have the feeling the non-linearity goes away with
generational garbage collection.
It seems insert/4 is the bottleneck here:
% insert(+Agenda, +Key, +Value, -Agenda)
insert([], K, V, [K-V]).
insert([J-W|L], K, V, [K-V,J-W|L]) :- K @=< J, !.
insert([P|L], K, V, [P|R]) :-
insert(L, K, V, R).
as it's not TCO without some smarts, which obviously SWI has.
Some say its TCO (Tail Call Optimization), usually identified as a recursive call into the same predicate again, other call it LCO (Last Call Optimization), need not be the same predicate.
I think SWI-Prolog has specialized on TCO. But some Prolog systems simply do something for LCO. LCO is basically the idea of the "depart" instruction in this paper:
https://www.softwarepreservation.org/projects/prolog/lisbon/lpw83/p74-Bowen.pdf
I wasn't able to do LCO properly in formerly Jekejeke Prolog, since I had the idea of maintaining a call stack. In Dogelog Player there is no more call stack, so LCO is better, slightly behind TCO:
SWI 9.3.8 | Doge 1.2.1 | Scryer 0.9.4 | TPL 2.53.50 | |
---|---|---|---|---|
slago | 934 | 2686 | 3383 | 4277 |
own | 15550 | 50444 | 66264 | 123535 |
SWI 9.3.8 | Doge 1.2.1 | Scryer 0.9.4 | TPL 2.53.50 | |
slago | 100% | 288% | 362% | 458% |
own | 100% | 324% | 426% | 794% |
difference | 37% | 64% | 337% |
I was redoing the benchmark to keep up with various changes. As can be seen, Scryer Prolog has only a minor non-linearity problem, compared to SWI-Prolog. Since the "depart" trick
already existed in WAM, and Scryer Prolog is WAM, I guess they get it from there? Not sure.
Now this here was a surprise. I am testing some A* search, here is what SWI-Prolog gives:
If I measure Trealla Prolog I see a non-linear slow down. The first example is ca. 5-times slower. But the second example is ca. 10-times slower.
Any idea how this happens? Some built-in like length/2 or member/2 or append/3 that gets dominant? Or garbage collection which chokes on large lists?
Source code:
puzzle.p.log