mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.
BSD 3-Clause "New" or "Revised" License
2.04k stars 119 forks source link

Unbounded %MEM growth #2628

Open dcnorris opened 2 days ago

dcnorris commented 2 days ago

Below is a self-contained repro for my problem. The unbounded %MEM growth in top apparently happens where I try to use d_n_qs_int/4 'backwards' to convert the integer 4th argument to the list-of-quotients 3rd arg. (I would really like not to have to implement separate 'forward' and 'backward' predicates! :clown_face: )

I'm very glad to pare this repro down further, if the problem is not immediately apparent.

% Repro for unbounded mem growth on large inputs
:- use_module(library(lists)).
:- use_module(library(clpz)).
:- use_module(library(lambda)).
:- use_module(library(error)).
:- use_module(library(debug)).

clpz:monotonic.

q_r(T/N, T:U) :- 0 #=< #T, 0 #=< #U, #N #= T + U.

qs_Ts_Us(Qs, ΣTs, ΣUs) :-
    maplist(\Q^T^U^(q_r(Q, T:U)), Qs, Ts, Us),
    intlist_partsums(Ts, ΣTs),
    intlist_partsums(Us, ΣUs).

intlist_partsums([X|Xs], [X|Ss]) :-
    intlist_partsums_acc(Xs, Ss, X).

intlist_partsums_acc([], [], _).
intlist_partsums_acc([X|Xs], [S|Ss], A) :-
    #S #= #X + #A,
    intlist_partsums_acc(Xs, Ss, S).

qs_d_nmax(Qs, D, Nmax) :-
    length(Qs, D),
    maplist(\Q^T^N^(Q = T/N), Qs, Ts, Ns),
    Ns ins 0..Nmax, label(Ns),
    maplist(\T^N^(T in 0..N), Ts, Ns), label(Ts).

% d_n_qs_int(+D, +N, ?Qs, ?K)
d_n_qs_int(D, N, Qs, K) :-
    #B #= #D * #N + 1, % K is a base-(DN+1) number
    #M #= #B ^ #D, % M-1 is maximum D-digit, base-B number
    length(Qs, D),
    qs_Ts_Us(Qs, Ts, Us),
    base_digits_int(B, Ts, TK),
    base_digits_int(B, Us, UK),
    %   Top D digits +  Low D digits
    #K #= (#M * #TK) + (#M - 1 - #UK). % a (2*D)-digit number

horner(X, A, P0, P) :- #P #= #A + #X * #P0. % https://en.wikipedia.org/wiki/Horner%27s_method

base_digits_int(B, Digs, K) :-
    #Bminus1 #= #B-1,
    Digs ins 0..Bminus1,
    foldl(horner(B), Digs, 0, K).

/*
?- D=3, Nmax=6,
   L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs),
   maplist(d_n_qs_int(D,Nmax), Qs, Ks),
   sort(Ks, SKs),
   same_length(SQs, Qs),
   maplist(same_length, SQs, Qs),
   maplist(d_n_qs_int(D,Nmax), SQs, SKs),
   length(SKs, L)).
%@    error('$interrupt_thrown',repl/0). %MEM in 'top' grows w/o limit
%@    D = 2, Nmax = 6, L = 784. % D=2 case completes in mere seconds
*/

% For context, the rationale for above is the following predicate,
% which accepts a _partially ordered_ +Qs and returns -SQs sorted
% according to a *complete* order that contains the partial order.
% In my application, this enables certain calculations to be done
% in a single pass through a large list such as the one generated
% by findall/3 above.
qs_sorted(Qs, SQs) :-
    N = 6, % TODO: Generalize
    maplist(d_n_qs_int(D,N), Qs, Ks),
    sort(Ks, SKs),
    same_length(SQs, Qs),
    maplist(same_length, SQs, Qs),
    maplist(d_n_qs_int(D,N), SQs, SKs).
triska commented 1 day ago

One general note, after a brief look at this example: Some computations inherently take a lot of space, this does not necessarily mean that the memory growth is unbounded.

In this concrete example, there seems to be a huge internal "branching" of terms. For instance, if we consider just the following fragment:

?- D=3, Nmax=6,
   L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs),
       length(Qs, L)).
   D = 3, Nmax = 6, L = 21952.

The step from D = 2 to D = 3 causes a huge increase in this length, from 784 to more than 20_000 elements.

This list with more than 20_000 elements is represented in memory, and subsequent goals seem to fan this out further.

I once had a similar situation with a huge Binary Decision Diagram (more than 40 GB) I wanted to represent in memory. For such purposes, you can rent a server with 256 GB of RAM for less than 200 USD per month:

https://www.hetzner.com/de/dedicated-rootserver/matrix-ex/

dcnorris commented 1 day ago

My little laptop has 64GB ram, I think. I know I will soon want to refine the generation of test cases to something less profligate, and no doubt I will take advantage of CLP(ℤ) for this! But my sense is that it's the "fanning-out" of subsequent goals you reference that really causes the problem. (I conjecture that a more 'eager' clpz might just eat the goals as they are posted, but don't know how to express that with any sophistication. I'm going to try rewriting the encoding predicate in the hope of promoting such eagerness.)

%?- D=2, Nmax=6, time(L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs), length(Qs, L))).
%@    % CPU time: 0.068s, 249_976 inferences
%@    D = 2, Nmax = 6, L = 784.

%?- D=3, Nmax=6, time(L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs), length(Qs, L))).
%@    % CPU time: 1.519s, 6_660_470 inferences
%@    D = 3, Nmax = 6, L = 21952.

%?- D=4, Nmax=6, time(L+\(findall(Q, qs_d_nmax(Q, D, Nmax), Qs), length(Qs, L))).
%@    % CPU time: 41.612s, 182_781_624 inferences
%@    D = 4, Nmax = 6, L = 614656.
triska commented 1 day ago

Also, Scryer Prolog does not yet have a garbage collector and therefore currently reclaims memory only on backtracking.

You can use this to reclaim unneeded memory, by wrapping a computation in findall/3 (which uses a failure-driven loop internally to find all solutions), especially if you need only a small number of terms as the result of the computation. For instance, with findall(N, computation(..., N, ...), [N]), you would get only the integer N (as presumed result of a computation), while the memory that was allocated during that computation is efficiently reclaimed. That's not elegant, but may work as a last resort and should of course ideally be kept as localized as possible.

Another way out could be store what you need more compactly in memory. For instance, you can use the fundamental theorem of arithmetic to store a sequence of positive integers [A,B,C,...] as a single integer by regarding the numbers as powers of different primes and computing a single composite number where the sequence can be read off from the prime decomposition when you need it, i.e., 2A×3B×5C×···. If you can represent a sequence of, say 6 integers, as a single integer that fits in a 64-bit WAM cell, then this will reduce the memory overhead by a factor of 18.

Another approach may be to introduce program-specific "abbreviations" of sequences or other data structures that occur frequently in your program, such as compressed(1), compressed(2), compressed(3) as variants of (say) verbatim([1,2,3]), verbatim(4,3,0) etc.

Yet another way could be to make use of Scryer Prolog's compact internal string representation, and store a sequence of integers as a list of characters with these code points. For instance, we would expect the following to create such a compact string from a list of codes, and also automatically reclaim data that was created during the conversion:

:- use_module(library(lists)).
:- use_module(library(iso_ext)).

codes_to_chars(Codes, Chars) :-
        findall(Cs, (maplist(char_code, Cs0, Codes),
                     partial_string(Cs0, Cs, [])),
                [Chars]).

Example:

?- codes_to_chars([1,2,3,4], Chars).
   Chars = "\x1\\x2\\x3\\x4\".

Chars can be represented extremely compactly in memory, especially with the rebis-dev development branch, which is in fact also needed to reclaim such strings on backtracking.