stevengj / nlopt

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization
Other
1.86k stars 574 forks source link

Version 2.8.0: Test fails when building with GCC 13 #563

Open badshah400 opened 1 month ago

badshah400 commented 1 month ago

When building nlopt version 2.8.0 for openSUSE Tumbleweed, where the default C/C++ compiler is GCC 13, we find that running ctest gives the following "buffer overflow" errors:

[   57s] 54/62 Test #54: testopt_algo26_obj1 ..............   Passed    0.00 sec
[   57s] 55/62 Test #55: testopt_algo27_obj0 ..............Subprocess aborted***Exception:   0.00 sec
[   57s] *** buffer overflow detected ***: terminated
[   57s] 
[   57s]       Start 58: testopt_algo28_obj1
[   57s]       Start 59: testopt_algo29_obj0
[   57s] 56/62 Test #56: testopt_algo27_obj1 ..............Subprocess aborted***Exception:   0.00 sec
[   57s] *** buffer overflow detected ***: terminated
[   57s] 
[   57s] 57/62 Test #57: testopt_algo28_obj0 ..............   Passed    0.00 sec
[   57s]       Start 60: testopt_algo29_obj1
[   57s]       Start 61: test_python
[   57s] 58/62 Test #58: testopt_algo28_obj1 ..............   Passed    0.00 sec
[   57s] 59/62 Test #59: testopt_algo29_obj0 ..............   Passed    0.00 sec
[   57s]       Start 62: test_octave
[   57s] 60/62 Test #60: testopt_algo29_obj1 ..............   Passed    0.00 sec
[   57s] 61/62 Test #61: test_python ......................   Passed    0.11 sec
[   57s] 62/62 Test #62: test_octave ......................   Passed    0.15 sec
[   57s] 
[   57s] 97% tests passed, 2 tests failed out of 62
[   57s] 
[   57s] Total Test time (real) =   0.22 sec
[   57s] 
[   57s] The following tests FAILED:
[   57s]     55 - testopt_algo27_obj0 (Subprocess aborted)
[   57s]     56 - testopt_algo27_obj1 (Subprocess aborted)
[   57s] Errors while running CTest
[   57s] error: Bad exit status from /var/tmp/rpm-tmp.SNBeIL (%check)
[   57s] 

We do not see these issues with the previous version of NLopt (2.7.1) using the same compiler, nor indeed when using older GCC (version 7) --- as we do for openSUSE Leap 15 --- to build NLopt 2.8.0.

Thanks.

stevengj commented 1 month ago

That's odd — algo27 should be newuoa.c, which hasn't changed in this release.

Can you run it in a debugger and get a stacktrace?

stevengj commented 1 month ago

I tried running valgrind test/testopt -r 0 -a 27 -o 0 and it ran with no errors in nlopt (using gcc 14).

(valgrind gives some warnings deep into a stacktrace for printf, but that looks like an unrelated libc false positive; it happens independent of the NLopt algorithm choice.)

Can try test/testopt -r 0 -a 27 -o 0 specifically, to make sure I'm looking at the right thing? If you build with cmake -DCMAKE_BUILD_TYPE=Debug you can also try running this in the debugger.

badshah400 commented 1 month ago

Yes this is it:

~> test/testopt -r 0 -a 27 -o 0
~> -----------------------------------------------------------
~> Optimizing Rosenbrock function (2 dims) using Bound-constrained optimization via NEWUOA-based quadratic models (local, no-derivative) algorithm
~> lower bounds at lb = [ -2 -2]
~> upper bounds at ub = [ 2 2]
~> Starting guess x = [ 0.097627 0.430379]
~> Starting function value = 18.5256
~> *** buffer overflow detected ***: terminated
~> /var/tmp/rpm-tmp.3ExlYR: line 34:  2328 Aborted                 test/testopt -r 0 -a 27 -o 0

I managed to get a backtrace, but I do not know how useful this is:

Starting program: /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/build/test/testopt -r 0 -a 27 -o 0
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.39-9.1.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGABRT, Aborted.
0x00007ffff7c949dc in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install libgcc_s1-debuginfo-14.2.0+git10526-1.1.x86_64 libstdc++6-debuginfo-14.2.0+git10526-1.1.x86_64
#0  0x00007ffff7c949dc in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff7c41176 in raise () from /lib64/libc.so.6
#2  0x00007ffff7c28917 in abort () from /lib64/libc.so.6
#3  0x00007ffff7c297e8 in __libc_message_impl.cold () from /lib64/libc.so.6
#4  0x00007ffff7d20bdb in __fortify_fail () from /lib64/libc.so.6
#5  0x00007ffff7d20506 in __chk_fail () from /lib64/libc.so.6
#6  0x00007ffff7f6dc13 in memset (__len=<optimized out>, __ch=<optimized out>, __dest=<optimized out>, __dest=<optimized out>, __ch=<optimized out>, __len=<optimized out>) at /usr/include/bits/string_fortified.h:59
#7  trsapp_ (ub=<optimized out>, lb=<optimized out>, xbase=<optimized out>, crvmin=<synthetic pointer>, hs=0x5555555726c8, hd=0x5555555726b8, g=0x5555555726a8, d__=0x555555572698, step=<optimized out>, delta=0x7fffffffdd58, pq=<optimized out>, hq=<optimized out>, gq=<optimized out>, xpt=<optimized out>, xopt=0x5555555724a8, npt=0x7fffffffdd38, n=0x7fffffffdd3c) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/algs/newuoa/newuoa.c:184
#8  newuob_ (w=0x555555572698, vlag=0x555555572660, d__=<optimized out>, ndim=0x7fffffffdd40, zmat=0x5555555725d8, bmat=0x555555572558, pq=0x555555572568, hq=0x555555572550, gq=0x555555572540, fval=0x555555572518, xpt=0x5555555724a0, xnew=0x5555555724b8, xopt=0x5555555724a8, xbase=<optimized out>, calfun_data=0x555555572320, calfun=0x7ffff7f7b970 <f_noderiv(int, double const*, void*)>, minf=0x7fffffffe048, stop=0x7fffffffdec0, ub=0x555555572440, lb=0x555555572420, rhobeg=<synthetic pointer>, x=0x5555555712a8, npt=0x7fffffffdd38, n=0x7fffffffdd3c) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/algs/newuoa/newuoa.c:1858
#9  newuoa (n=<optimized out>, npt=<optimized out>, x=0x5555555712a8, lb=0x555555572420, ub=0x555555572440, rhobeg=1, stop=0x7fffffffdec0, minf=0x7fffffffe048, calfun=0x7ffff7f7b970 <f_noderiv(int, double const*, void*)>, calfun_data=0x555555572320) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/algs/newuoa/newuoa.c:2571
#10 0x00007ffff7f881c4 in nlopt_optimize_ (minf=0x7fffffffe048, x=<optimized out>, opt=0x555555572320) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/api/optimize.c:718
#11 nlopt_optimize (opt=opt@entry=0x555555572320, x=x@entry=0x5555555712b0, opt_f=opt_f@entry=0x7fffffffe048) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/api/optimize.c:890
#12 0x0000555555557077 in test_function (ifunc=<optimized out>) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/test/testopt.c:241
#13 main (argc=7, argv=0x7fffffffe1c8) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/test/testopt.c:362
quit
stevengj commented 1 month ago

Thanks, it's failing at this line: https://github.com/stevengj/nlopt/blob/58995c25b4d918759a107bd52a457122343b9c6d/src/algs/newuoa/newuoa.c#L184

where step is a pointer to an array passed in from &d__[1] on this line: https://github.com/stevengj/nlopt/blob/58995c25b4d918759a107bd52a457122343b9c6d/src/algs/newuoa/newuoa.c#L1859

which is passed in from the &w[id] parameter on this line: https://github.com/stevengj/nlopt/blob/58995c25b4d918759a107bd52a457122343b9c6d/src/algs/newuoa/newuoa.c#L2575

(gotta love these f2c-translated Fortran codes).

I added a printf statement

    printf("DEBUG: iw = %d, n = %d, len = %d\n",
           id, n, ((npt+13)*(npt+n) + 3*(n*(n+3))/2));

right before the newuob_ call, and in the test case above, it prints out

DEBUG: iw = 56, n = 2, len = 141

which indicates that plenty of space has been allocated (we are looking at 2 elements right in the middle of a length-141 array w, so there shouldn't be a buffer overrun). I also tried adding a couple of printf's to make sure that &step[1] is indeed the same as &w[id], and that checks out:

diff --git a/src/algs/newuoa/newuoa.c b/src/algs/newuoa/newuoa.c
index a3428a6..e82be63 100644
--- a/src/algs/newuoa/newuoa.c
+++ b/src/algs/newuoa/newuoa.c
@@ -181,6 +181,7 @@ static nlopt_result trsapp_(int *n, int *npt, double *xopt,
              if (sub[j] < 0) sub[j] = 0;
              xtol[j] = 1e-7 * *delta; /* absolute x tolerance */
         }
+        printf("DEBUG 2: &step[1] = %p, n = %d\n", &step[1], *n);
         memset(&step[1], 0, sizeof(double) * *n);
         opt = nlopt_create(NLOPT_LD_MMA, *n);
         nlopt_set_min_objective(opt, quad_model, &qmd);
@@ -2564,6 +2565,9 @@ nlopt_result newuoa(int n, int npt, double *x,
     if (!w) return NLOPT_OUT_OF_MEMORY;
     --w;

+
+        printf("DEBUG 1: &w[id] = %p, n = %d\n", &w[id], n);
+
 /* The above settings provide a partition of W for subroutine NEWUOB. */
 /* The partition requires the first NPT*(NPT+N)+5*N*(N+3)/2 elements of */
 /* W plus the space that is needed by the last array of NEWUOB. */

As I said, I can't reproduce this problem with gcc 14 and valgrind, so I'm not sure what the problem could be. Can you try with gcc 14?

badshah400 commented 1 month ago

Crashes with GCC 14 too, unfortunately. Perhaps you could try compiling with the following additional GCC flags used by default on openSUSE during compilation to see if you can reproduce the issue:

-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type  -g -ffp-contract=off -O2 -g -DNDEBUG -fPIC -MD -MT

In case it helps, here is a full log for the failing build: _log.zip

badshah400 commented 1 month ago

Managed to narrow it down to the use of -D_FORTIFY_SOURCE=3. If I change this to -D_FORTIFY_SOURCE=2, the tests all pass. However, according to this, this does suggest a bug in the code that is missed by -D_FORTIFY_SOURCE=2, if I understand correctly.

stevengj commented 1 month ago

It's possible that FORTIFY_SOURCE doesn't like the Fortran-style 1-based indexing that is generated by f2c, which is perfectly safe (if used correctly) but may look odd to the compiler.

In particular, in order to implement 1-based indexing in C code, f2c takes all of the array pointers and decrements them by 1 at the beginning of each function — that's why you'll see lines like --step. Then the first element becomes step[1] (rather than C's usual step[0]), but this may confuse gcc's "fortification" since *step itself points to an invalid location (8 bytes before the beginning of the array)?

In which case you should just turn off the FORTIFY_SOURCE option and ignore this. I'm inclined to think that this is a bug in the FORTIFY_SOURCE mode — it's getting confused about valid pointer dereferences due to the weird way that buffers are managed in newuoa.c.

(If valgrind, which is much more rigorous, passes, and FORTIFY_SOURCE fails, then it seems likely that's a bug in FORTIFY_SOURCE. On my machine, however, it succeeds even with the -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 flags.)

badshah400 commented 1 month ago

All right, I built the package using 2 instead of 3 for D_FORTIFY_SOURCE and submitted it. Many thanks for the discussion, your suggestions and advice. Feel free to close this issue at your convenience.

bkmgit commented 6 days ago

Similar error on Fedora 42

60/64 Test #55: testopt_algo27_obj0 ..............Subprocess aborted***Exception:   0.15 sec
*** buffer overflow detected ***: terminated
61/64 Test #56: testopt_algo27_obj1 ..............Subprocess aborted***Exception:   0.15 sec
*** buffer overflow detected ***: terminated

BuildLog

bkmgit commented 6 days ago

There are a few warnings for the Nelder-Mead algorithm

nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/api -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -DNDEBUG -std=gnu++11 -fPIC -MD -MT CMakeFiles/nlopt.dir/src/algs/ags/solver.cc.o -MF CMakeFiles/nlopt.dir/src/algs/ags/solver.cc.o.d -o CMakeFiles/nlopt.dir/src/algs/ags/solver.cc.o -c /builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/algs/ags/solver.cc
/builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/redhat-linux-build/nlopt.hpp:121: Warning 401: Nothing known about base class 'std::runtime_error'. Ignored.
/builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/redhat-linux-build/nlopt.hpp:125: Warning 401: Nothing known about base class 'std::runtime_error'. Ignored.
In file included from /usr/include/string.h:548,
                 from /builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/algs/neldermead/nldrmd.c:25:
In function ‘memset’,
    inlined from ‘nldrmd_minimize_’ at /builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/algs/neldermead/nldrmd.c:205:4:
/usr/include/bits/string_fortified.h:59:10: warning: ‘memset’ specified bound between 18446744056529682432 and 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
   59 |   return __builtin___memset_chk (__dest, __ch, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   60 |                                  __glibc_objsize0 (__dest));

and

In function ‘memset’,
    inlined from ‘nldrmd_minimize_’ at /builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/algs/neldermead/nldrmd.c:205:4:
/usr/include/bits/string_fortified.h:59:10: warning: ‘__builtin_memset’ specified bound between 18446744056529682432 and 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
   59 |   return __builtin___memset_chk (__dest, __ch, __len,
      |          ^