sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.45k stars 482 forks source link

fedora-30-standard: Doctests using system brial crash #29490

Closed mkoeppe closed 4 years ago

mkoeppe commented 4 years ago

Follow-up from #29369: I see crashes in running doctests on fedora-30-standard (https://github.com/mkoeppe/sage/runs/572856797), which uses system brial 1.2.5-1

sage -t src/sage/rings/polynomial/multi_polynomial_sequence.py  # Killed due to abort
sage -t src/sage/rings/polynomial/pbori.pyx  # Killed due to abort
sage -t src/sage/rings/polynomial/polynomial_ring_constructor.py  # Killed due to abort
sage -t src/sage/sat/boolean_polynomials.py  # Killed due to abort
sage -t src/sage/sat/solvers/dimacs.py  # Killed due to abort

We remove the spkg-configure for Sage 9.1.

A follow up ticket can restore and repair it for 9.2.

CC: @orlitzky @embray @kiwifb @mkoeppe @dimpase @kliem @SnarkBoojum

Component: porting

Author: Matthias Koeppe

Branch: 8679b65

Reviewer: Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/29490

mkoeppe commented 4 years ago
comment:3

This ticket needs some attention

mkoeppe commented 4 years ago
comment:4

If this cannot be fixed, we can certainly also disable the brial spkg-configure for the 9.1 release...

kliem commented 4 years ago
comment:5

This error is all the same on fedora 31.

The only thing I can find is that debian patches brial since very long with this

Description: Protect CErrorInfo::text() against invalid array access
 * The test suite tests this in PBoRiErrorTest.cc, but the tests
   failed only on the ia64 and alpha architectures.
Author: Tobias Hansen <thansen@debian.org>

--- a/libbrial/src/CErrorInfo.cc
+++ b/libbrial/src/CErrorInfo.cc
@@ -47,7 +47,7 @@
 CErrorInfo::text(errornum_type num) {

   PBORI_TRACE_FUNC( "CErrorInfo::text(errornum_type) const" );
-  if PBORI_UNLIKELY(num > CTypes::last_error)
+  if PBORI_UNLIKELY(num < 0 || num >= CTypes::last_error)
     return "Unknown error occured.";

   return pErrorText[num];

fedora doesn't patch at all. This patch isn't included into brial until version 1.2.8.

kliem commented 4 years ago
comment:6

Can we check for this at configuration time?

dimpase commented 4 years ago
comment:7

There is no direct way known to trigger this, the best I know is

2020-04-18T21:46:00.4537962Z sage -t src/sage/rings/polynomial/multi_polynomial_sequence.py
2020-04-18T21:46:00.4538239Z     Killed due to abort
2020-04-18T21:46:00.4538512Z **********************************************************************
2020-04-18T21:46:00.4538774Z Tests run before process (pid=31088) failed:
2020-04-18T21:46:00.4539023Z sage: sr = mq.SR(2,1,2,4,gf2=True,polybori=True) ## line 26 ##
2020-04-18T21:46:00.4539280Z sage: sr ## line 27 ##
2020-04-18T21:46:00.4539527Z SR(2,1,2,4)
2020-04-18T21:46:00.4539777Z sage: set_random_seed(1) ## line 33 ##
2020-04-18T21:46:00.4540187Z sage: F,s = sr.polynomial_system() ## line 34 ##
2020-04-18T21:46:00.4541037Z terminate called after throwing an instance of 'std::bad_alloc'
2020-04-18T21:46:00.4541926Z   what():  std::bad_alloc
2020-04-18T21:46:00.4542456Z ------------------------------------------------------------------------
2020-04-18T21:46:00.4542936Z /sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x9158)[0x7fae23047158]
2020-04-18T21:46:00.4543404Z /sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x91f9)[0x7fae230471f9]
2020-04-18T21:46:00.4543853Z /sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0xcb6d)[0x7fae2304ab6d]
2020-04-18T21:46:00.4544069Z /lib64/libpthread.so.0(+0x12c70)[0x7fae2479fc70]
2020-04-18T21:46:00.4544278Z /lib64/libc.so.6(gsignal+0x145)[0x7fae244abe35]
2020-04-18T21:46:00.4544478Z /lib64/libc.so.6(abort+0x127)[0x7fae24496895]
2020-04-18T21:46:00.4544683Z /lib64/libstdc++.so.6(+0x9e756)[0x7fae1b20a756]
2020-04-18T21:46:00.4545000Z /lib64/libstdc++.so.6(+0xaa6dc)[0x7fae1b2166dc]
2020-04-18T21:46:00.4545187Z /lib64/libstdc++.so.6(+0xaa747)[0x7fae1b216747]
2020-04-18T21:46:00.4545383Z /lib64/libstdc++.so.6(+0xaa9b9)[0x7fae1b2169b9]
2020-04-18T21:46:00.4545582Z /lib64/libstdc++.so.6(+0x9e3c6)[0x7fae1b20a3c6]
2020-04-18T21:46:00.4546100Z /sage/local/lib/python3.7/site-packages/sage/rings/polynomial/pbori.cpython-37m-x86_64-linux-gnu.so(_ZNSt5dequeIN8polybori14CCuddNavigatorESaIS1_EE17_M_reallocate_mapEmb+0xba)[0x7fade0ab8d0a]
2020-04-18T21:46:00.4546654Z /sage/local/lib/python3.7/site-packages/sage/rings/polynomial/pbori.cpython-37m-x86_64-linux-gnu.so(_ZNSt5dequeIN8polybori14CCuddNavigatorESaIS1_EE24_M_new_elements_at_frontEm+0xc8)[0x7fade0ab8fa8]
2020-04-18T21:46:00.4547288Z /sage/local/lib/python3.7/site-packages/sage/rings/polynomial/pbori.cpython-37m-x86_64-linux-gnu.so(_ZNSt5dequeIN8polybori14CCuddNavigatorESaIS1_EE19_M_range_insert_auxISt15_Deque_iteratorIS1_RKS1_PS6_EEEvS5_IS1_RS1_PS1_ET_SD_St20forward_iterator_tag+0x37f)[0x7fade0aba1ff]
2020-04-18T21:46:00.4547572Z /lib64/libbrial.so.3(_ZN8polybori13CDegTermStackINS_14CCuddNavigatorENS_9valid_tagENS_11invalid_tagENS_18CAbstractStackBaseIS1_EEE8findTermEm+0x51d)[0x7fade099f1bd]
2020-04-18T21:46:00.4547822Z /lib64/libbrial.so.3(_ZN8polybori13CDegTermStackINS_14CCuddNavigatorENS_9valid_tagENS_11invalid_tagENS_18CAbstractStackBaseIS1_EEE9incrementEv+0x412)[0x7fade099f832]
2020-04-18T21:46:00.4548072Z 
...
mkoeppe commented 4 years ago
comment:8

Could you prepare a test based on this crash please?

dimpase commented 4 years ago
comment:9

perhaps a test based on CErrorInfo::text in comment:5 is doable.

kliem commented 4 years ago
comment:10

Sorry. I think I was completely wrong with comment:5.

Fedora claims to use the exact same source files as we do. Maybe some linking didn't work out.

Replying to @dimpase:

perhaps a test based on CErrorInfo::text in comment:5 is doable.

kliem commented 4 years ago
comment:11

A wild guess it that fedora somehow messed up this:

https://src.fedoraproject.org/rpms/brial/blob/f30/f/brial.spec

Maybe here


%build

export CPPFLAGS="-DPBORI_NDEBUG"

%configure --enable-shared --disable-static

# Get rid of undesirable hardcoded rpaths, and workaround libtool reordering

# -Wl,--as-needed after all the libraries.

sed -e 's|^hardcode_libdir_flag_spec=.*|hardcode_libdir_flag_spec=""|g' \

    -e 's|^runpath_var=LD_RUN_PATH|runpath_var=DIE_RPATH_DIE|g' \

    -e 's|CC="\(g..\)"|CC="\1 -Wl,--as-needed"|' \

    -i libtool

But I really have no clue.

dimpase commented 4 years ago
comment:12

Replying to @dimpase:

perhaps a test based on CErrorInfo::text in comment:5 is doable.

no, apparently it is not the case, this patch is there. I tried the following on Fedora 30 (brial 1.2.5)

#include <iostream>
#include <polybori/except/CErrorInfo.h>
int main () {
    polybori::CErrorInfo foo;
    std::cout << polybori::CTypes::last_error << "\n";
    std::cout << foo.text(-1) << "\n";
    return 1;
}

and it prints

clpc171[/tmp]$ g++ bri.cc -lbrial
clpc171[/tmp]$ ./a.out 
12
Unknown error occured.

no crash.

kliem commented 4 years ago
comment:13

I think this is because -1 is interpreted as unsigned. At least my compiler interprets errornum_type as unsigned. This would also explain why this specific patch is needed only on some seldom architectures.

kliem commented 4 years ago
comment:14

And I don't think this patch is present in the current sage either.

mkoeppe commented 4 years ago

Branch: u/mkoeppe/fedora_30_standard__doctests_using_system_brial_crash

mkoeppe commented 4 years ago

Author: Matthias Koeppe

mkoeppe commented 4 years ago

Changed branch from u/mkoeppe/fedora_30_standard__doctests_using_system_brial_crash to none

mkoeppe commented 4 years ago

Branch: u/mkoeppe/fedora_30_standard__doctests_using_system_brial_crash

mkoeppe commented 4 years ago

New commits:

8679b65Back out brial spkg-configure.m4 for 9.1
mkoeppe commented 4 years ago

Description changed:

--- 
+++ 
@@ -8,4 +8,6 @@
 sage -t src/sage/sat/solvers/dimacs.py  # Killed due to abort

+We remove the spkg-configure for Sage 9.1.

+A follow up ticket can restore and repair it for 9.2.

mkoeppe commented 4 years ago

Commit: 8679b65

dimpase commented 4 years ago

Reviewer: Dima Pasechnik

dimpase commented 4 years ago
comment:19

OK, let's skip brial for the time being.

mkoeppe commented 4 years ago
comment:20

Thanks!

vbraun commented 4 years ago

Changed branch from u/mkoeppe/fedora_30_standard__doctests_using_system_brial_crash to 8679b65

dimpase commented 4 years ago

Changed commit from 8679b65 to none

dimpase commented 4 years ago
comment:22

See also comment:19 in #28349

dimpase commented 4 years ago
comment:23

follow, with a better fix, on ##29792