trealla-prolog / trealla

A compact, efficient Prolog interpreter written in plain-old C.
MIT License
266 stars 14 forks source link

GSL: Bugs #573

Open flexoron opened 1 month ago

flexoron commented 1 month ago
This bundle is mathematically error free:

matrix_calloc
permutation_alloc
LU_decomp
LU_det

Exercisers:

$ cat x.c
#include <stdio.h>
#include <gsl/gsl_version.h>
#include <gsl/gsl_linalg.h>

int main (int argc, char *argv[]) {
  int sig, stat;
  double det;
  int rows = atoi(argv[1]);
  int cols = atoi(argv[2]);
  printf("v%s %dx%d\n",GSL_VERSION,rows,cols);
  gsl_matrix *mat = gsl_matrix_calloc(rows, cols);
  gsl_permutation *perm = gsl_permutation_alloc(rows);
  stat = gsl_linalg_LU_decomp(mat, perm, &sig);
   det = gsl_linalg_LU_det(mat, sig);
  printf("%lf\n",det);
  gsl_permutation_free(perm);
  gsl_matrix_free(mat);
  return 0;
}

$ cat x.pl
:- use_module(library(gsl)).
tst(Rows,Cols)  :-
              gsl_matrix_calloc(Rows, Cols, M),
              gsl_permutation_alloc(Rows, P),
              gsl_linalg_LU_decomp(M, P, Sig, Stat),
              gsl_linalg_LU_det(M, Sig, Det),
              write(Det),nl,
              gsl_permutation_free(P),
              gsl_matrix_free(M).
RESULTS:

GSL 2.7.1
$ ./a.out 127 127
v2.7.1 127x127
-nan  #  Libgsl: known bug in v2.7
$ 
$ tpl x.pl
?- tst(127,127).
-.0nan   % Trealla morphs -nan into -.0nan
   true.
?- halt.
$

GSL 2.8
$ ./a.out 127 127
v2.8 127x127
: 
# printf in function gsl_linalg_LU_decomp() of gsl-git/linalg/lu.c
# convert ipiv array to permutation
# unsigned int pivi = gsl_vector_uint_get(ipiv, i);
:  
FF pivi 13 i 13  
FF pivi 14 i 14
FF pivi 15 i 15
FF pivi 0 i 16  #  Trealla Crash here
FF pivi 0 i 17
FF pivi 0 i 18
: 
FF pivi 0 i 124
FF pivi 0 i 125
FF pivi 0 i 126
-0.000000   # expected, v2.7 bug '-nan' has been fixed
$

$ tpl x.pl
?- tst(127,127).  % First Call
: 
FF pivi 13 i 13
FF pivi 14 i 14
FF pivi 15 i 15
FF pivi 210299952 i 16  % looks like Trealla puts an address somewhere in an array
Segmentation fault (core dumped)
$
infradig commented 1 month ago

Fixed the NAN printing issue. I'll look into the other one... it requires GSL v2.8 right?

flexoron commented 1 month ago

Yes, you don't have to install it, just make:

https://www.gnu.org/software/gsl/
$ tar xf gsl-release-2-8.tar.gz 
$ cd gsl-release-2-8
$ ./autogen.sh
$ ./configure
$ make
$ export LD_LIBRARY_PATH=$HOME/tools/gsl/gsl-release-2-8/.libs:$HOME/tools/gsl/gsl-release-2-8/cblas/.libs
$
$ gcc -I$HOME/tools/gsl/gsl-release-2-8 x.c -L$HOME/tools/gsl/gsl-release-2-8/.libs -L$HOME/tools/gsl/gsl-release-2-8/cblas/.libs -lgsl -lgslcblas -lm
$ ldd a.out # check libs
$ tpl x.pl 
?- 

# check libs
$ ps ax|grep tpl
72213 ....
$ lsof -p 72213 | grep gsl  
infradig commented 1 month ago

Thanks, fixed my problems there.

On Fri, Jul 26, 2024 at 8:54 AM flexoron @.***> wrote:

Yes, you don't have to install it, just make:

https://www.gnu.org/software/gsl/ $ tar xf gsl-release-2-8.tar.gz $ cd gsl-release-2-8 $ ./autogen.sh $ ./configure $ make $ export LD_LIBRARY_PATH=$HOME/tools/gsl/gsl-release-2-8/.libs:$HOME/tools/gsl/gsl-release-2-8/cblas/.libs $ $ $ gcc -I$HOME/tools/gsl/gsl-release-2-8 x.c -L$HOME/tools/gsl/gsl-release-2-8/.libs -L$HOME/tools/gsl/gsl-release-2-8/cblas/.libs -lgsl -lgslcblas -lm $ ldd a.out # check libs $ tpl x.pl ?-

check libs

$ ps ax|grep tpl 72213 .... $ lsof -p 72213 | grep gsl

— Reply to this email directly, view it on GitHub https://github.com/trealla-prolog/trealla/issues/573#issuecomment-2251531872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNKSEV5OEQIKYCGHJSCQTLZOF6Z5AVCNFSM6AAAAABLOIBUKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJRGUZTCOBXGI . You are receiving this because you commented.Message ID: @.***>

infradig commented 1 month ago

Interesting, if in lu.c / gsl_linalg_LU_decomp() I change:

      gsl_vector_uint * ipiv = gsl_vector_uint_alloc(minMN);

to

      gsl_vector_uint * ipiv = gsl_vector_uint_calloc(minMN);

it works. Seems a bug in GSL 2.8 to me, depending on uninitialized values.

flexoron commented 1 month ago
Not sure, the C-program does not crash and for example:

$ tpl x.pl
?- between(2,128,S), tst(S,S), fail. % does not crash
?- halt.

$ tpl x.pl
?- between(124,128,S), tst(S,S), fail. % crash
FF pivi 14 i 14
FF pivi 15 i 15
FF pivi 368989664 i 16
Segmentation fault (core dumped)
infradig commented 1 month ago

The C program is simple and is allocating virgin zeroed memory from the OS. In Trealla a lot mallocs & frees have been done.

On Fri, Jul 26, 2024 at 11:31 AM flexoron @.***> wrote:

Not sure, the C-program does not crash and for example:

$ tpl x.pl ?- between(2,128,S), tst(S,S), fail. % does not crash ?- halt.

$ tpl x.pl ?- between(124,128,S), tst(S,S), fail. % crash FF pivi 14 i 14 FF pivi 15 i 15 FF pivi 368989664 i 16 Segmentation fault (core dumped)

— Reply to this email directly, view it on GitHub https://github.com/trealla-prolog/trealla/issues/573#issuecomment-2251786939, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNKSEXYKFQPSLFSKUWAAY3ZOGRI3AVCNFSM6AAAAABLOIBUKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJRG44DMOJTHE . You are receiving this because you commented.Message ID: @.***>

flexoron commented 1 month ago

tpl with GSL v2.7.1 does not crash, too: v2.7.1: gsl_vector_uint * ipiv = gsl_vector_uint_alloc(minMN);

infradig commented 1 month ago

You don't know what else changed that accesses that memory, maybe nothing but ...

On Fri, Jul 26, 2024 at 11:37 AM flexoron @.***> wrote:

tpl with GSL v2.7.1 does not crash, too: v2.7.1: gsl_vector_uint * ipiv = gsl_vector_uint_alloc(minMN);

— Reply to this email directly, view it on GitHub https://github.com/trealla-prolog/trealla/issues/573#issuecomment-2251791593, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNKSEQS5TOG34EODYPAKJDZOGR4HAVCNFSM6AAAAABLOIBUKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJRG44TCNJZGM . You are receiving this because you commented.Message ID: @.***>

infradig commented 1 month ago

Added this to that function

      gsl_vector_uint * ipiv = gsl_vector_uint_alloc(minMN);
+      for (unsigned i = 0; i < ipiv->size; i++)
+       printf("*** [%u] %u\n", (unsigned)i, (unsigned)ipiv->data[i]);

and it shows all zero elements for x.c & some non-zero ones for x.pl

GSL 2.8 is buggy. It was only released a few months ago.

flexoron commented 1 month ago

Yes but this way: non-zero,zero,no-zero,zero,non-zero,zero, or 32,0,64,0,55,0,and so on Hmm what now. v2.7.1 is buggy too: result '-nan' is not correct. gsl_permutation_calloc/2 does not change it, too

flexoron commented 1 month ago

printf("FF pivi %u i %lu\n",pivi,i); if (p->data[pivi] != p->data[i]) it is the index value of pivi data[pivi] pivi 456910304 = data[456910304]

infradig commented 1 month ago

LU_decomp_L3() & apply_pivots() can't work if the vector ipiv wasn't zeroed properly, so pivi can be incorrect.

infradig commented 1 month ago

I would suggest submitting a bug report. I did one for GCC about 20 years ago, took about 6 months to get acted on.

flexoron commented 1 month ago

: gsl_matrix_calloc(Rows, Cols, M), gsl_permutation_alloc(Rows, P), write(M/P),nl, : 968671312 / 969277808 FF pivi 969168480 i 16 Segmentation fault (core dumped)

You see this? The values of M and P and this pivi value are always too close. I don't think that this is by chance.

infradig commented 1 month ago

The memory can contain anything. If you dump the newly alloc'd vector it contains rubbish, which is useless as it's used without ever being initialized.

On Fri, Jul 26, 2024 at 12:14 PM flexoron @.***> wrote:

: gsl_matrix_calloc(Rows, Cols, M), gsl_permutation_alloc(Rows, P), write(M/P),nl, : 968671312 / 969277808 FF pivi 969168480 i 16 Segmentation fault (core dumped)

You see this? The values of M and P and this pivi value are always too close. I don't think that this is by chance.

— Reply to this email directly, view it on GitHub https://github.com/trealla-prolog/trealla/issues/573#issuecomment-2251836514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNKSEQ6BIZ73SRTQ37EIPDZOGWHRAVCNFSM6AAAAABLOIBUKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJRHAZTMNJRGQ . You are receiving this because you commented.Message ID: @.***>

infradig commented 1 month ago

I'm talking about this...

      gsl_vector_uint * ipiv = gsl_vector_uint_alloc(minMN);
+      for (unsigned i = 0; i < ipiv->size; i++)
+       printf("*** [%u] %u\n", (unsigned)i, (unsigned)ipiv->data[i]);

if the values are non-zero (which they are at some points) you are bound to get errors when they are used as an array index.

flexoron commented 1 month ago

?- tst(40,40). 21852240/21852304 FF pivi 22458688 i 24 Segmentation fault (core dumped)

Again pivi is nearby M and P

?- tst(127,127). 362111056/363287184 FF pivi 362608224 i 16 Segmentation fault (core dumped)

infradig commented 1 month ago

I don't know what you want me to do... gsl_vector_uint_alloc returns unzeroed data (by design) when the following code expected zeroed data.

flexoron commented 1 month ago

Conclusion: Both versions are buggy and your finding might be a solution to work with 2.8 gsl_vector_uint * ipiv = gsl_vector_uint_calloc(minMN);

infradig commented 1 month ago

URL: https://savannah.gnu.org/bugs/?66026

             Summary: gsl_linalg_LU_decomp using uninitialized memory
               Group: GNU Scientific Library
           Submitter: infradig
           Submitted: Fri 26 Jul 2024 02:56:44 AM UTC
            Category: Runtime error
            Severity: 3 - Normal
    Operating System: Ubuntu
              Status: None
         Assigned to: None
         Open/Closed: Open
     Discussion Lock: Any
             Release: 2.8
flexoron commented 1 month ago
Here we are

$ cat x.c
: 
  gsl_matrix *mat = gsl_matrix_calloc(rows, cols);
  gsl_permutation *perm = gsl_permutation_alloc(rows);

{
int i, j;
gsl_matrix *m = gsl_matrix_alloc(rows, cols);
for (i = 0; i < rows; i++)
    for (j = 0; j < cols; j++)
        gsl_matrix_set(m, i, j, 2000000000);
gsl_matrix_free(m);
}

  stat = gsl_linalg_LU_decomp(mat, perm, &sig);
   det = gsl_linalg_LU_det(mat, sig);
: 

$ gcc ...
$ ./a.out 127 127
v2.8 127x127
: 
FF pivi 15 i 15
FF pivi 0 i 16
FF pivi 1105055077 i 17
Segmentation fault (core dumped)
$
flexoron commented 1 month ago
fyi
More info for if those gsl guys start a discussion:

1.) ./a.out 128 128 or 256 256 or 512 512 or 1024 1024 does not crash,
    so you don't always get the fault.

2.) The problem might be the status of a singular matrix:

status = LU_decomp_L3 (&AL.matrix, ipiv);

If not singular then status = 0 and all fields of vector ipiv are valid(have been processed).

if singular then status = number of fields which have been processed.

For example: singular 127x127 status = 16
means that this is wrong:
for (i = 0; i < minMN; ++i) {  # the segfault area

Better is this:
size_t damage = minMN;
if (status) damage = status; 
for (i = 0; i < damage; ++i) {
}  

Even better (perhaps)
if (status)  { ; /* singular, do nothing */
} else { 
   for (i = 0; i < minMN; ++i) { 
   : 
}
gsl_vector_uint_free(ipiv);
return status;

Performance back: no calloc and in case of singular dismiss signum as valueless.
Buggy gsl 2.7 does not crash because it handles a singular matrix as if it is non-singular:
status = LU_decomp_L3 (&AL.matrix, ipiv);
status is always 0 (vector fields are 'filled' according to alloc 127 for example).

Buggy gsl 2.8 stops when it is clear that its a singular matrix and returns a value >= 1.
status = LU_decomp_L3 (&AL.matrix, ipiv);
status == 16 (for example), fields(-index) > 16 stay untouched hence the crash(see above).
infradig commented 1 month ago

I'd copy/paste this but maybe you should create an account an submit a comment.

On Sun, 28 July 2024, 01:04 flexoron, @.***> wrote:

fyi More info for if those gsl guys start a discussion:

1.) ./a.out 128 128 or 256 256 or 512 512 or 1024 1024 does not crash, so you don't always get the fault.

2.) The problem might be the status of a singular matrix:

status = LU_decomp_L3 (&AL.matrix, ipiv);

If not singular then status = 0 and all fields of vector ipiv are valid(have been processed).

if singular then status = number of fields which have been processed.

For example: singular 127x127 status = 16 means that this is wrong: for (i = 0; i < minMN; ++i) { # the segfault area

Better is this: if (!status) status = minMN; for (i = 0; i < status; ++i) {

Even better (perhaps) if (status) { ; / singular, do nothing / } else { for (i = 0; i < minMN; ++i) { : } gsl_vector_uint_free(ipiv); return status;

Performance back: no calloc and in case of singular dismiss signum as valueless.

— Reply to this email directly, view it on GitHub https://github.com/trealla-prolog/trealla/issues/573#issuecomment-2254173125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNKSEXXYCHJTEBRGCDP7HDZOOZIJAVCNFSM6AAAAABLOIBUKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGE3TGMJSGU . You are receiving this because you modified the open/close state.Message ID: @.***>

flexoron commented 1 month ago
x.pl 

Instead of
gsl_matrix_calloc(Rows, Cols, M),  % Fixed: -.0nan now prints -nan (see Fix float nan-value printing)
do
mat_random(M, Rows, Cols),  % inf-value printing need a fix: -.0inf

?- tst(1200,1200).
-.0inf
?-