semigroups / Semigroups

The GAP package Semigroups
https://semigroups.github.io/Semigroups/
Other
23 stars 36 forks source link

Use the C++ compiler to link semigroups.so #942

Closed fingolfin closed 1 year ago

fingolfin commented 1 year ago

Resolves #928

This is a simpler (?) alternative to PR #937 and thus closes #937 if it gets merged.

@dimpase could you verify if this also resolves the issue (modulo #938 of course) for you?

dimpase commented 1 year ago

Trying to work around #943 to test this atm.

dimpase commented 1 year ago

OK, I'm finally down to testing on OpenBSD. I use an external Eigen library (it's an OpenBSD package), and build/install libsemigroups as follows:

$ ./config.status --config  
'--prefix=/home/dima' '--disable-hpcombi' '--disable-backward' '--with-external-eigen' 

(these --disable-* are needed, otherwise it doesn't build)

For semigroups I set PKG_CONFIG_PATH=/home/dima/lib/pkgconfig - that's the installation prefix of libsemigroups with lib/pkgconfig appended. It's needed for configuring external libsemigroup, using pkg-config.

For running GAP (and even for running ldd on semigroups.so) I need LD_LIBRARY_PATH= /home/dima/lib, in view of #938.

So far I tested the rebased branch on #937, it works (package loads in GAP, TestPackage("semigroups"); runs, with some errors, but test errors are another story). On OpenBSD GAP does not need to be linked with -pthread, it works with and without it.

Testing this branch now.

The semigroups GAP package is configured with

$ ./config.status --config                                                                                                                                                                                                                                                                                                               
'--with-external-libsemigroups' '--disable-hpcombi'  'PKG_CONFIG_PATH=/home/dima/lib/pkgconfig'
dimpase commented 1 year ago

This branch does not play as nice with tests. TestPackage("semigroups"); ends with a coredump.

...
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:209
# Input is:
T = H;
# Expected output:
true
# But found:
Error, Variable: 'H' must have an assigned value
########
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:211
# Input is:
Factorization(H, images[2]);
# Expected output:
[ 2 ]
# But found:
Error, Variable: 'H' must have an assigned value
########
libc++abi: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
Abort trap (core dumped)
fingolfin commented 1 year ago

An uncaught exception sounds like a problem in libsemigroups. In any case, I don't see how that could fail in this PR and work in PR #937... Just to double check, did you also run the tests in otherwise identical setup (in particular: same gap and same Semigroups commit) with PR #937?

fingolfin commented 1 year ago

Ah sorry, I now see you wrote in a previous comment that you first tested with PR #937 and it all passed and then did the same test with this PR here and it gave that crash, is that right?

Is the H must have an assigned value also visible with PR #937? It would be useful to have the full test logs for both PRs, perhaps you can attach them to a comment here (note that file attachments in GitHub only work with certain file extensions, such as .txt)

dimpase commented 1 year ago

some tests with #937 fail, with a bad alloc error, too, but GAP stays up. So it looks like it's a slightly different error handling, more robust.

will post the logs

dimpase commented 1 year ago

OK, ran tests number of times, with both this and the old branch. The new branch doesn't really display a different behaviour (crashes don't happen during every run, and anyway it's a memory corruption/leak that finally brings GAP down sometimes).

Leaking/broken test is apparently semigroups/tst/standard/semigroups/semidp.tst and semigroups/tst/standard/semigroups/semiquo.tst, possibly more, only when run with acting methods disabled.

So here what all the logs I made have in common:

...
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/tools/io.tst
    2438 ms (1711 ms GC) and 7.58MB allocated for tools/io.tst
-----------------------------------
total   1072164 ms (115772 ms GC) and 17.0GB allocated
              0 failures in 72 files

ESC[1m#I  Running tests with acting methods disabled
ESC[0mArchitecture: x86_64-unknown-openbsd7.3-default64-kv9

testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/attributes/acting.tst
    2281 ms (1625 ms GC) and 47.6MB allocated for attributes/acting.tst
...
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semicons.ts\
t
   22820 ms (1734 ms GC) and 105MB allocated for semigroups/semicons.tst
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semidp.tst
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semidp.tst:319
# Input is:
Size(D) = Size(T) ^ 3;
# Expected output:
ESC[30;42mtrueESC[0m
# But found:
ESC[30;41mError, std::bad_allocESC[0m
########
   10906 ms (1812 ms GC) and 109MB allocated for semigroups/semidp.tst
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semieunit.t\
st
...
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:121
# Input is:
Size(S / cong);
# Expected output:
ESC[30;42m120ESC[0m
# But found:
ESC[30;41mError, std::bad_allocESC[0m
########
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:123
# Input is:
IsomorphismPermGroup(S / cong);
# Expected output:
ESC[30;42m<quotient of <semigroup congruence over <inverse partial perm semigrou\
p 
  of size 720, rank 10 with 14 generators> with congruence pair (6,1)>> -> 
<permutation group of size 120 with 2 generators>ESC[0m
# But found:
ESC[30;41mError, std::bad_allocESC[0m
########
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:138
# Input is:
Size(S);
# Expected output:
ESC[30;42m336ESC[0m
# But found:
ESC[30;41mError, std::bad_allocESC[0m
########
...

Also,when it comes to running these tests, memory used by GAP shots up by about 1GB - another indication of a leak to me.

dimpase commented 1 year ago

@fingolfin - a typo to be fixed on https://github.com/fingolfin/Semigroups/pull/1

fingolfin commented 1 year ago

@dimpase the typo is not in this PR, though, wouldn't it make more sense to directly submit it to this repository?

dimpase commented 1 year ago

It's a typo made by you in an earlier iteration. Let's not be overtly bureacratic here.

Doing it the other way would likely create a merge conflict to resolve.

james-d-mitchell commented 1 year ago

@dimpase can I have access to a machine where I can reproduce the bug? I agree with @fingolfin that this looks like an issue with the package or libsemigroups rather than the build system

dimpase commented 1 year ago

@dimpase can I have access to a machine where I can reproduce the bug? I agree with @fingolfin that this looks like an issue with the package or libsemigroups rather than the build system

Absolutely, no problem. Email me the public part (.pub) of your ssh key.

dimpase commented 1 year ago

@dimpase can I have access to a machine where I can reproduce the bug? I agree with @fingolfin that this looks like an issue with the package or libsemigroups rather than the build system

Absolutely, no problem. Email me the public part (.pub) of your ssh key.

Hey @james-d-mitchell , you already have an account on this machine - you accessed in in June afaict. :-) try ssh jdm@sagemath.openbsd.amsterdam

fingolfin commented 1 year ago

It's a typo made by you in an earlier iteration. Let's not be overtly bureacratic here.

??? This PR here may never be merged. Making a fix to something not touched by this PR only in here at best delays the fix from becoming generally available and at worst means the fix is never applied.

Doing it the other way would likely create a merge conflict to resolve.

How? My PR does not even touch configure.ac

dimpase commented 1 year ago

How? My PR does not even touch configure.ac

oh, OK, I confused it with Makefile.in

james-d-mitchell commented 1 year ago

Thanks @dimpase i will try to look asap, but probably not until Monday

james-d-mitchell commented 1 year ago

Tests fail, would be better if they passed

dimpase commented 1 year ago

Tests fail, would be better if they passed

Did they work before?

They all (with exception of cygwin, which I can't say much about now) seem to fail due to inability to find libsemigroups at runtime, i.e. either LD_LIBRARY_PATH must be set, or better #938 resolved)

james-d-mitchell commented 1 year ago

Tests fail, would be better if they passed

Did they work before?

They all (with exception of cygwin, which I can't say much about now) seem to fail due to inability to find libsemigroups at runtime, i.e. either LD_LIBRARY_PATH must be set, or better #938 resolved)

Yes they passed before for example in your most recent pr.

dimpase commented 1 year ago

are they all running in miniconda environment with libsemigroups coming from conda?

dimpase commented 1 year ago

By the way, I ran make check on libsemigroup on the OpenBSD box, and got a bad::alloc error in one test:

...
FpSemigroup 041: equal_to . . .                                          2205μs
FpSemigroup 042: cbegin/cend_rules . . .                                  177μs
FpSemigroup 043: semigroup of size 3 . . .                                458μs
FpSemigroup 044: run_for/ulibc++abi: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
ntil . . .                                     2574μs
FpSemigroup 045: constructors . . .                                       264μs
FpSemigroup 046: set_inverses . . .                                       421μs
FpSemigroup 047: smalloverlap . . .                                        19ms
FpSemigroup 048: quaternion group Q8 . . .                                492μs

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_all is a Catch v2.13.7 host application.
Run with -? for options

-------------------------------------------------------------------------------
FpSemigroup 049: symmetric group Coxeter presentation
-------------------------------------------------------------------------------
tests/test-fpsemi.cpp:1040
...............................................................................

tests/test-fpsemi.cpp:1040: FAILED:
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

FpSemigroup 049: symmetric group Coxeter presentation . . .                    
===============================================================================
test cases:    293 |    292 passed | 1 failed
assertions: 434209 | 434208 passed | 1 failed

Abort trap (core dumped) 
make: *** [Makefile:5367: check] Error 134
james-d-mitchell commented 1 year ago

Seems to fail with the external libsemigroups. Are you sure that the failing tests aren't just because the test machine has limited memory, and the tests are legitimately using large amounts of memory?

dimpase commented 1 year ago

Seems to fail with the external libsemigroups. Are you sure that the failing tests aren't just because the test machine has limited memory, and the tests are legitimately using large amounts of memory?

the machine has 2GB of physical RAM and 4GB of swap; and the default user is limited to processes using 4GB of memory.

Do you really run tests which need gigabytes of RAM?

james-d-mitchell commented 1 year ago

Probably not, just checking!

dimpase commented 1 year ago

@fingolfin - I reproduced these CI errors locally on Debian oldstable linux.

The problem is that these incantations on Linux don't lead to libsemigroups being linked with semigroups.so

dimpase@penguin:~/work/software/gap/pkg/semigroups$ make -j2 V=1
g++  -L/home/dimpase/lib -lsemigroups  gen/src/bipart.o gen/src/cong.o gen/src/conglatt.o gen/src/froidure-pin-base.o gen/src/froidure-pin-bipart.o gen/src/froidure-pin-bmat.o gen/src/froidure-pin-fallback.o gen/src/froidure-pin-matrix.o gen/src/froidure-pin-max-plus-mat.o gen/src/froidure-pin-min-plus-mat.o gen/src/froidure-pin-pbr.o gen/src/froidure-pin-pperm.o gen/src/froidure-pin-transf.o gen/src/pkg.o gen/src/to_gap.o gen/gapbind14/src/gapbind14.o -o bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so   -shared -fPIC
dimpase@penguin:~/work/software/gap/pkg/semigroups$ ldd bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so 
        linux-vdso.so.1 (0x00007fff349b3000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007c11eea9a000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007c11ee956000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007c11ee93c000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007c11ee768000)
        /lib64/ld-linux-x86-64.so.2 (0x00007c11efd38000)

dimpase@penguin:~/work/software/gap/pkg/semigroups$ ls -l /home/dimpase/lib/*semi*
-rw-r--r-- 1 dimpase dimpase 3775574 Sep 10 11:26 /home/dimpase/lib/libsemigroups.a
-rwxr-xr-x 1 dimpase dimpase     987 Sep 10 11:26 /home/dimpase/lib/libsemigroups.la
lrwxrwxrwx 1 dimpase dimpase      22 Sep 10 11:26 /home/dimpase/lib/libsemigroups.so -> libsemigroups.so.2.0.0
lrwxrwxrwx 1 dimpase dimpase      22 Sep 10 11:26 /home/dimpase/lib/libsemigroups.so.2 -> libsemigroups.so.2.0.0
-rwxr-xr-x 1 dimpase dimpase 1766768 Sep 10 11:26 /home/dimpase/lib/libsemigroups.so.2.0.0
dimpase@penguin:~/work/software/gap/pkg/semigroups$ ldd /home/dimpase/lib/libsemigroups.so
        linux-vdso.so.1 (0x00007ffe761c5000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007d2076671000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007d20764a4000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007d2076360000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007d207618c000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007d2076172000)
        /lib64/ld-linux-x86-64.so.2 (0x00007d207682a000)
dimpase commented 1 year ago

Here is what the current main branch is doing instead

$ make V=1
"/home/dimpase/work/software/gap/gac" -d -p "-MQ "bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so" -MMD -MP -MF bin/x86_64-pc-linux-gnu-default64-kv9/.d" -P "-lstdc++   -L/home/dimpase/lib -lsemigroups " gen/src/bipart.o gen/src/cong.o gen/src/conglatt.o gen/src/froidure-pin-base.o gen/src/froidure-pin-bipart.o gen/src/froidure-pin-bmat.o gen/src/froidure-pin-fallback.o gen/src/froidure-pin-matrix.o gen/src/froidure-pin-max-plus-mat.o gen/src/froidure-pin-min-plus-mat.o gen/src/froidure-pin-pbr.o gen/src/froidure-pin-pperm.o gen/src/froidure-pin-transf.o gen/src/pkg.o gen/src/to_gap.o gen/gapbind14/src/gapbind14.o -o bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so
gcc -o bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so gen/src/bipart.o gen/src/cong.o gen/src/conglatt.o gen/src/froidure-pin-base.o gen/src/froidure-pin-bipart.o gen/src/froidure-pin-bmat.o gen/src/froidure-pin-fallback.o gen/src/froidure-pin-matrix.o gen/src/froidure-pin-max-plus-mat.o gen/src/froidure-pin-min-plus-mat.o gen/src/froidure-pin-pbr.o gen/src/froidure-pin-pperm.o gen/src/froidure-pin-transf.o gen/src/pkg.o gen/src/to_gap.o gen/gapbind14/src/gapbind14.o -shared -fPIC -lstdc++ -L/home/dimpase/lib -lsemigroups
rm -f

and this works correctly:

dimpase@penguin:~/work/software/gap/pkg/semigroups$ ldd bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so 
        linux-vdso.so.1 (0x00007ffe7517e000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007eac1b85c000)
        libsemigroups.so.2 => /home/dimpase/lib/libsemigroups.so.2 (0x00007eac1b6df000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007eac1b6c5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007eac1b4f1000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007eac1b3ad000)
        /lib64/ld-linux-x86-64.so.2 (0x00007eac1cafa000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007eac1b38b000)
dimpase commented 1 year ago

here is how to fix it - just to shuffle the arguments of the linking call around. Only checked on Linux.

--- a/Makefile.gappkg
+++ b/Makefile.gappkg
@@ -138,7 +138,7 @@ gen/%.$(GAP_OBJEXT): %.s Makefile
 # build rule for linking all object files together into a kernel extension
 $(KEXT_SO): $(KEXT_OBJS)
        @mkdir -p $(@D)
-       $(QUIET_GAC)$(GAP_CXX) $(KEXT_LDFLAGS) $(KEXT_OBJS) -o $@ $(GAP_LDFLAGS) $(GAC_LDFLAGS)
+       $(QUIET_GAC)$(GAP_CXX) -o $@ $(GAP_LDFLAGS) $(GAC_LDFLAGS)  $(KEXT_OBJS) $(KEXT_LDFLAGS)

 # hook into `make clean`
 clean: clean-kext
dimpase commented 1 year ago

@fingolfin - please see https://github.com/fingolfin/Semigroups/pull/2

on my fork this CI step passes. https://github.com/dimpase/Semigroups/actions/runs/6138576172

I'll check the rest now. Update: the standard tests pass too (on Cygwin there forking errors during execution of some tests, but I guess it's OK). https://github.com/dimpase/Semigroups/actions/runs/6138704922

fingolfin commented 1 year ago

Thanks @dimpase I've updated this PR to fix the link order

james-d-mitchell commented 1 year ago

Thanks very much @fingolfin and @dimpase