Closed fingolfin closed 1 year ago
Trying to work around #943 to test this atm.
OK, I'm finally down to testing on OpenBSD. I use an external Eigen library (it's an OpenBSD package), and build/install libsemigroups
as follows:
$ ./config.status --config
'--prefix=/home/dima' '--disable-hpcombi' '--disable-backward' '--with-external-eigen'
(these --disable-*
are needed, otherwise it doesn't build)
For semigroups
I set PKG_CONFIG_PATH=/home/dima/lib/pkgconfig
- that's the installation prefix of libsemigroups
with lib/pkgconfig
appended. It's needed for configuring external libsemigroup
, using pkg-config
.
For running GAP (and even for running ldd
on semigroups.so
) I need LD_LIBRARY_PATH= /home/dima/lib
, in view of #938.
So far I tested the rebased branch on #937, it works (package loads in GAP, TestPackage("semigroups");
runs, with some errors, but test errors are another story). On OpenBSD GAP does not need to be linked with -pthread
, it works with and without it.
Testing this branch now.
The semigroups GAP package is configured with
$ ./config.status --config
'--with-external-libsemigroups' '--disable-hpcombi' 'PKG_CONFIG_PATH=/home/dima/lib/pkgconfig'
This branch does not play as nice with tests. TestPackage("semigroups");
ends with a coredump.
...
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:209
# Input is:
T = H;
# Expected output:
true
# But found:
Error, Variable: 'H' must have an assigned value
########
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:211
# Input is:
Factorization(H, images[2]);
# Expected output:
[ 2 ]
# But found:
Error, Variable: 'H' must have an assigned value
########
libc++abi: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
Abort trap (core dumped)
An uncaught exception sounds like a problem in libsemigroups. In any case, I don't see how that could fail in this PR and work in PR #937... Just to double check, did you also run the tests in otherwise identical setup (in particular: same gap and same Semigroups commit) with PR #937?
Ah sorry, I now see you wrote in a previous comment that you first tested with PR #937 and it all passed and then did the same test with this PR here and it gave that crash, is that right?
Is the H must have an assigned value
also visible with PR #937? It would be useful to have the full test logs for both PRs, perhaps you can attach them to a comment here (note that file attachments in GitHub only work with certain file extensions, such as .txt)
some tests with #937 fail, with a bad alloc error, too, but GAP stays up. So it looks like it's a slightly different error handling, more robust.
will post the logs
OK, ran tests number of times, with both this and the old branch. The new branch doesn't really display a different behaviour (crashes don't happen during every run, and anyway it's a memory corruption/leak that finally brings GAP down sometimes).
Leaking/broken test is apparently semigroups/tst/standard/semigroups/semidp.tst
and semigroups/tst/standard/semigroups/semiquo.tst
, possibly more, only when run with acting methods disabled
.
So here what all the logs I made have in common:
...
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/tools/io.tst
2438 ms (1711 ms GC) and 7.58MB allocated for tools/io.tst
-----------------------------------
total 1072164 ms (115772 ms GC) and 17.0GB allocated
0 failures in 72 files
ESC[1m#I Running tests with acting methods disabled
ESC[0mArchitecture: x86_64-unknown-openbsd7.3-default64-kv9
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/attributes/acting.tst
2281 ms (1625 ms GC) and 47.6MB allocated for attributes/acting.tst
...
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semicons.ts\
t
22820 ms (1734 ms GC) and 105MB allocated for semigroups/semicons.tst
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semidp.tst
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semidp.tst:319
# Input is:
Size(D) = Size(T) ^ 3;
# Expected output:
ESC[30;42mtrueESC[0m
# But found:
ESC[30;41mError, std::bad_allocESC[0m
########
10906 ms (1812 ms GC) and 109MB allocated for semigroups/semidp.tst
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semieunit.t\
st
...
testing: /home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:121
# Input is:
Size(S / cong);
# Expected output:
ESC[30;42m120ESC[0m
# But found:
ESC[30;41mError, std::bad_allocESC[0m
########
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:123
# Input is:
IsomorphismPermGroup(S / cong);
# Expected output:
ESC[30;42m<quotient of <semigroup congruence over <inverse partial perm semigrou\
p
of size 720, rank 10 with 14 generators> with congruence pair (6,1)>> ->
<permutation group of size 120 with 2 generators>ESC[0m
# But found:
ESC[30;41mError, std::bad_allocESC[0m
########
######## > Diff in:
/home/dima/tmp/gap/pkg/semigroups/tst/standard/semigroups/semiquo.tst:138
# Input is:
Size(S);
# Expected output:
ESC[30;42m336ESC[0m
# But found:
ESC[30;41mError, std::bad_allocESC[0m
########
...
Also,when it comes to running these tests, memory used by GAP shots up by about 1GB - another indication of a leak to me.
@fingolfin - a typo to be fixed on https://github.com/fingolfin/Semigroups/pull/1
@dimpase the typo is not in this PR, though, wouldn't it make more sense to directly submit it to this repository?
It's a typo made by you in an earlier iteration. Let's not be overtly bureacratic here.
Doing it the other way would likely create a merge conflict to resolve.
@dimpase can I have access to a machine where I can reproduce the bug? I agree with @fingolfin that this looks like an issue with the package or libsemigroups rather than the build system
@dimpase can I have access to a machine where I can reproduce the bug? I agree with @fingolfin that this looks like an issue with the package or libsemigroups rather than the build system
Absolutely, no problem. Email me the public part (.pub) of your ssh key.
@dimpase can I have access to a machine where I can reproduce the bug? I agree with @fingolfin that this looks like an issue with the package or libsemigroups rather than the build system
Absolutely, no problem. Email me the public part (.pub) of your ssh key.
Hey @james-d-mitchell , you already have an account on this machine - you accessed in in June afaict. :-)
try ssh jdm@sagemath.openbsd.amsterdam
It's a typo made by you in an earlier iteration. Let's not be overtly bureacratic here.
??? This PR here may never be merged. Making a fix to something not touched by this PR only in here at best delays the fix from becoming generally available and at worst means the fix is never applied.
Doing it the other way would likely create a merge conflict to resolve.
How? My PR does not even touch configure.ac
How? My PR does not even touch
configure.ac
oh, OK, I confused it with Makefile.in
Thanks @dimpase i will try to look asap, but probably not until Monday
Tests fail, would be better if they passed
Tests fail, would be better if they passed
Did they work before?
They all (with exception of cygwin, which I can't say much about now) seem to fail due to inability to find libsemigroups at runtime, i.e. either LD_LIBRARY_PATH must be set, or better #938 resolved)
Tests fail, would be better if they passed
Did they work before?
They all (with exception of cygwin, which I can't say much about now) seem to fail due to inability to find libsemigroups at runtime, i.e. either LD_LIBRARY_PATH must be set, or better #938 resolved)
Yes they passed before for example in your most recent pr.
are they all running in miniconda environment with libsemigroups coming from conda?
By the way, I ran make check
on libsemigroup on the OpenBSD box, and got a bad::alloc error in one test:
...
FpSemigroup 041: equal_to . . . 2205μs
FpSemigroup 042: cbegin/cend_rules . . . 177μs
FpSemigroup 043: semigroup of size 3 . . . 458μs
FpSemigroup 044: run_for/ulibc++abi: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
ntil . . . 2574μs
FpSemigroup 045: constructors . . . 264μs
FpSemigroup 046: set_inverses . . . 421μs
FpSemigroup 047: smalloverlap . . . 19ms
FpSemigroup 048: quaternion group Q8 . . . 492μs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_all is a Catch v2.13.7 host application.
Run with -? for options
-------------------------------------------------------------------------------
FpSemigroup 049: symmetric group Coxeter presentation
-------------------------------------------------------------------------------
tests/test-fpsemi.cpp:1040
...............................................................................
tests/test-fpsemi.cpp:1040: FAILED:
due to a fatal error condition:
SIGABRT - Abort (abnormal termination) signal
FpSemigroup 049: symmetric group Coxeter presentation . . .
===============================================================================
test cases: 293 | 292 passed | 1 failed
assertions: 434209 | 434208 passed | 1 failed
Abort trap (core dumped)
make: *** [Makefile:5367: check] Error 134
Seems to fail with the external libsemigroups. Are you sure that the failing tests aren't just because the test machine has limited memory, and the tests are legitimately using large amounts of memory?
Seems to fail with the external libsemigroups. Are you sure that the failing tests aren't just because the test machine has limited memory, and the tests are legitimately using large amounts of memory?
the machine has 2GB of physical RAM and 4GB of swap; and the default user is limited to processes using 4GB of memory.
Do you really run tests which need gigabytes of RAM?
Probably not, just checking!
@fingolfin - I reproduced these CI errors locally on Debian oldstable linux.
The problem is that these incantations on Linux don't lead to libsemigroups
being linked with semigroups.so
dimpase@penguin:~/work/software/gap/pkg/semigroups$ make -j2 V=1
g++ -L/home/dimpase/lib -lsemigroups gen/src/bipart.o gen/src/cong.o gen/src/conglatt.o gen/src/froidure-pin-base.o gen/src/froidure-pin-bipart.o gen/src/froidure-pin-bmat.o gen/src/froidure-pin-fallback.o gen/src/froidure-pin-matrix.o gen/src/froidure-pin-max-plus-mat.o gen/src/froidure-pin-min-plus-mat.o gen/src/froidure-pin-pbr.o gen/src/froidure-pin-pperm.o gen/src/froidure-pin-transf.o gen/src/pkg.o gen/src/to_gap.o gen/gapbind14/src/gapbind14.o -o bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so -shared -fPIC
dimpase@penguin:~/work/software/gap/pkg/semigroups$ ldd bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so
linux-vdso.so.1 (0x00007fff349b3000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007c11eea9a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007c11ee956000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007c11ee93c000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007c11ee768000)
/lib64/ld-linux-x86-64.so.2 (0x00007c11efd38000)
dimpase@penguin:~/work/software/gap/pkg/semigroups$ ls -l /home/dimpase/lib/*semi*
-rw-r--r-- 1 dimpase dimpase 3775574 Sep 10 11:26 /home/dimpase/lib/libsemigroups.a
-rwxr-xr-x 1 dimpase dimpase 987 Sep 10 11:26 /home/dimpase/lib/libsemigroups.la
lrwxrwxrwx 1 dimpase dimpase 22 Sep 10 11:26 /home/dimpase/lib/libsemigroups.so -> libsemigroups.so.2.0.0
lrwxrwxrwx 1 dimpase dimpase 22 Sep 10 11:26 /home/dimpase/lib/libsemigroups.so.2 -> libsemigroups.so.2.0.0
-rwxr-xr-x 1 dimpase dimpase 1766768 Sep 10 11:26 /home/dimpase/lib/libsemigroups.so.2.0.0
dimpase@penguin:~/work/software/gap/pkg/semigroups$ ldd /home/dimpase/lib/libsemigroups.so
linux-vdso.so.1 (0x00007ffe761c5000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007d2076671000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007d20764a4000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007d2076360000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007d207618c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007d2076172000)
/lib64/ld-linux-x86-64.so.2 (0x00007d207682a000)
Here is what the current main branch is doing instead
$ make V=1
"/home/dimpase/work/software/gap/gac" -d -p "-MQ "bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so" -MMD -MP -MF bin/x86_64-pc-linux-gnu-default64-kv9/.d" -P "-lstdc++ -L/home/dimpase/lib -lsemigroups " gen/src/bipart.o gen/src/cong.o gen/src/conglatt.o gen/src/froidure-pin-base.o gen/src/froidure-pin-bipart.o gen/src/froidure-pin-bmat.o gen/src/froidure-pin-fallback.o gen/src/froidure-pin-matrix.o gen/src/froidure-pin-max-plus-mat.o gen/src/froidure-pin-min-plus-mat.o gen/src/froidure-pin-pbr.o gen/src/froidure-pin-pperm.o gen/src/froidure-pin-transf.o gen/src/pkg.o gen/src/to_gap.o gen/gapbind14/src/gapbind14.o -o bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so
gcc -o bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so gen/src/bipart.o gen/src/cong.o gen/src/conglatt.o gen/src/froidure-pin-base.o gen/src/froidure-pin-bipart.o gen/src/froidure-pin-bmat.o gen/src/froidure-pin-fallback.o gen/src/froidure-pin-matrix.o gen/src/froidure-pin-max-plus-mat.o gen/src/froidure-pin-min-plus-mat.o gen/src/froidure-pin-pbr.o gen/src/froidure-pin-pperm.o gen/src/froidure-pin-transf.o gen/src/pkg.o gen/src/to_gap.o gen/gapbind14/src/gapbind14.o -shared -fPIC -lstdc++ -L/home/dimpase/lib -lsemigroups
rm -f
and this works correctly:
dimpase@penguin:~/work/software/gap/pkg/semigroups$ ldd bin/x86_64-pc-linux-gnu-default64-kv9/semigroups.so
linux-vdso.so.1 (0x00007ffe7517e000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007eac1b85c000)
libsemigroups.so.2 => /home/dimpase/lib/libsemigroups.so.2 (0x00007eac1b6df000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007eac1b6c5000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007eac1b4f1000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007eac1b3ad000)
/lib64/ld-linux-x86-64.so.2 (0x00007eac1cafa000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007eac1b38b000)
here is how to fix it - just to shuffle the arguments of the linking call around. Only checked on Linux.
--- a/Makefile.gappkg
+++ b/Makefile.gappkg
@@ -138,7 +138,7 @@ gen/%.$(GAP_OBJEXT): %.s Makefile
# build rule for linking all object files together into a kernel extension
$(KEXT_SO): $(KEXT_OBJS)
@mkdir -p $(@D)
- $(QUIET_GAC)$(GAP_CXX) $(KEXT_LDFLAGS) $(KEXT_OBJS) -o $@ $(GAP_LDFLAGS) $(GAC_LDFLAGS)
+ $(QUIET_GAC)$(GAP_CXX) -o $@ $(GAP_LDFLAGS) $(GAC_LDFLAGS) $(KEXT_OBJS) $(KEXT_LDFLAGS)
# hook into `make clean`
clean: clean-kext
@fingolfin - please see https://github.com/fingolfin/Semigroups/pull/2
on my fork this CI step passes. https://github.com/dimpase/Semigroups/actions/runs/6138576172
I'll check the rest now. Update: the standard tests pass too (on Cygwin there forking errors during execution of some tests, but I guess it's OK). https://github.com/dimpase/Semigroups/actions/runs/6138704922
Thanks @dimpase I've updated this PR to fix the link order
Thanks very much @fingolfin and @dimpase
Resolves #928
This is a simpler (?) alternative to PR #937 and thus closes #937 if it gets merged.
@dimpase could you verify if this also resolves the issue (modulo #938 of course) for you?