mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.
BSD 3-Clause "New" or "Revised" License
2.05k stars 121 forks source link

rebis-dev: '$skip_max_list'/4 speed comparison #1176

Closed triska closed 2 years ago

triska commented 2 years ago

In order to help track down potential causes for the speed difference between Scryer Prolog and other systems (see #1138), I have constructed the following benchmark for 'skip_max_list'/4:

:- use_module(library(time)).
:- use_module(library(lists)).
:- use_module(library(clpz)).
:- use_module(library(format)).

run :-
    length(_, L),
    E #= 2^L,
    portray_clause(L),
    length(Ls, E),
    time(s(N, -1, Ls, _)),
    portray_clause(N),
    false.

s(N, Max, Ls0, Ls) :-
    '$skip_max_list'(N, Max, Ls0, Ls).

yielding:

$ scryer-prolog skip.pl
Warning: overwriting number/1
?- run.
...
19.
   % CPU time: 0.006s
524288.
20.
   % CPU time: 0.010s
1048576.
21.
   % CPU time: 0.018s
2097152.
22.
   % CPU time: 0.038s
4194304.
23.
   % CPU time: 0.079s
8388608.
24.
   % CPU time: 0.144s
16777216.
25.
   % CPU time: 0.385s
33554432.
26.
   % CPU time: 0.714s
67108864.
27.
memory allocation of 4294967296 bytes failed
Aborted

The nice thing about '$skip_max_list'/4 is that it is in a sense "self-contained", letting us ignore for the moment speed differences that arise due to different machine architectures. It tests a low-level facility of the system itself, such as efficiently categorizing list elements and moving around in memory.

For comparison, here is an adapted program for SWI-Prolog:

run :-
    length(_, L),
    E is 2^L,
    portray_clause(L),
    length(Ls, E),
    time(s(N, _, Ls, _)),
    portray_clause(N),
    false.

s(N, Max, Ls0, Ls) :-
    '$skip_list'(N, Ls0, Ls).

(Note that SWI-Prolog does not support '$skip_max_list'/4, but only a more limited form called '$skip_list'/3 that I am using here.)

Yielding:

$ swipl -O --stack_limit=3GB skip.pl
Warning: /home/mt/trealla/skip.pl:11:
Warning:    Singleton variables: [Max]
Welcome to SWI-Prolog (threaded, 64 bits, version 8.1.19-36-g794745bb1)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- run.
...
19.
% 1 inferences, 0.002 CPU in 0.002 seconds (100% CPU, 510 Lips)
524288.
20.
% 1 inferences, 0.004 CPU in 0.004 seconds (100% CPU, 240 Lips)
1048576.
21.
% 1 inferences, 0.009 CPU in 0.009 seconds (99% CPU, 116 Lips)
2097152.
22.
% 1 inferences, 0.017 CPU in 0.017 seconds (100% CPU, 59 Lips)
4194304.
23.
% 1 inferences, 0.033 CPU in 0.033 seconds (100% CPU, 30 Lips)
8388608.
24.
% 1 inferences, 0.067 CPU in 0.067 seconds (100% CPU, 15 Lips)
16777216.
25.
% 1 inferences, 0.187 CPU in 0.187 seconds (100% CPU, 5 Lips)
33554432.
26.
% 1 inferences, 0.270 CPU in 0.270 seconds (100% CPU, 4 Lips)
67108864.
27.
ERROR: Stack limit (3.0Gb) exceeded
ERROR:   Stack sizes: local: 1Kb, global: 34Kb, trail: 3Kb

Note especially the difference for N=26, where we get for Scryer:

26.
   % CPU time: 0.714s

And for SWI:

26.
% 1 inferences, 0.270 CPU in 0.270 seconds (100% CPU, 4 Lips)

This shows that a further at least 2-fold improvement may be obtainable in the rebis-dev branch by accessing memory more efficiently, maybe?

triska commented 2 years ago

For comparison, a version for Trealla Prolog:

:- use_module(library(format)).

run :-
    length(_, L),
    E is 2^L,
    portray_clause(L),
    length(Ls, E),
    time(s(N, _, Ls, _)),
    portray_clause(N),
    false.

s(N, Max, Ls0, Ls) :-
    '$skip_max_list'(N, Max, Ls0, Ls).

(Trealla Prolog has currently the best interface for '$skip_max_list'/4 (see #1023), where the second argument can be a variable.)

Yielding:

$ ./tpl skip_trealla.pl
Trealla Prolog (c) Infradig 2020-2021, v1.18.47
?- run.
...
19.
Time elapsed 0.00327 secs
524288.
20.
Time elapsed 0.00668 secs
1048576.
21.
Time elapsed 0.0135 secs
2097152.
22.
Time elapsed 0.0277 secs
4194304.
23.
Time elapsed 0.0544 secs
8388608.
24.
Aborted
UWN commented 2 years ago

Just for the record, Trealla does too much and thereby reduces the applicability of the built-in. It identifies the precise point where the loop starts. That's some extra overhead which helps nobody and it even prevents to use '$skip_max_list'/4 to estimate the size of the loop.

?- phrase("abcabc",L0,L1), phrase("abc",L1,L1), '$skip_max_list'(Skipped, Max, L0,L).
   L0 = [_0|_3], L1 = [_0|_3], Skipped = 6, L = L1. % Skipped too small
?- phrase("abcabc",L0,L1), phrase("abcabc",L1,L1), '$skip_max_list'(Skipped, Max, L0,L).
   L0 = [_0|_3], L1 = [_0|_3], Skipped = 6, L = L1. % Skipped too small
?- phrase("abc",L0,L1), phrase("abcabc",L1,L1), '$skip_max_list'(Skipped, Max, L0,L).
   L0 = [_0|_3], L1 = [_0|_3], Skipped = 3, L = L1.

To see this, consider:

Trealla Prolog (c) Infradig 2020-2021, v1.19.1-2-gcd9b5
?- phrase("abcabc",L0,L1), phrase("abc",L1,L1), '$skip_max_list'(Skipped, Max, L0,L),
   '$skip_max_list'(Skipped2, _, L,_).
   L0 = [_0|_3], L1 = [_0|_3], Skipped = 6, L = L1, Skipped2 = 0. % unexpected

The second use of '$skip_max_list'/4 should give us an upper bound of the size of the loop that is a value a bit larger than 3. Instead, 0 is given.

mthom commented 2 years ago

I'm happy to report that on my machine, the changes of the rebis-dev_improved-length branch cause Markus' benchmark program to run slighter faster on Scryer than SWI.

UWN commented 2 years ago

Here are current benchmarks without referral to internals:

?- length(_,E),E>20,setof(t,L^(N is 2^E,time(length(L,N)),time(length(L,M))),_).

Scryer v0.9.0-238-g53cd863

 E = 21 0.017s 0.009s
 E = 22 0.028s 0.018s
 E = 23 0.046s 0.019s
 E = 24 0.060s 0.038s
 E = 25 0.152s 0.074s
 E = 26 0.236s 0.144s
 E = 27 0.587s 0.301s
 E = 28 1.262s 0.569s

SICStus 4.8.0beta2

 E = 21 0.002s 0.004s
 E = 22 0.008s 0.009s
 E = 23 0.014s 0.019s
 E = 24 0.024s 0.035s
 E = 25 0.044s 0.054s
 E = 26 0.077s 0.103s
 E = 27 0.153s 0.213s
 E = 28 0.303s 0.439s

SICStus 4.7.1

 E = 21 0.033s 0.003s
 E = 22 0.045s 0.006s
 E = 23 0.055s 0.012s
 E = 24 0.147s 0.025s
 E = 25 0.183s 0.053s
 E = 26 0.546s 0.098s
 E = 27 0.745s 0.215s
 E = 28 2.052s 0.422s

So I'd say this issue here can be closed. Scryer is close enough.

triska commented 2 years ago

Perfect, thank you for these comparisons!