ralna / spral

Sparse Parallel Robust Algorithms Library
https://ralna.github.io/spral/
Other
106 stars 26 forks source link

make check test suite ssids seg fault #84

Closed johnmatt3 closed 2 years ago

johnmatt3 commented 2 years ago

I am trying to follow the instructions at https://gist.github.com/tasseff/ee61ef6c15d3c54e0a6b3e488f2a65be to get IPOPT running with the GPU version of SPRAL. The CPU version works fine, but the GPU version seg faults on the test problem suggested in the instructions. Digging further, it appears that SPRAL's SSID test fails (i.e. make check fails). I am pretty confident that I have included all of the information to reproduce my situation below.

Is it possible that this is a hardware compatibility issue? I tested this on a 3080 founders edition, not a professional card. Is there any set of known working hardware that I can just buy?

my system specs: CPU: i5-10400f GPU: Nvidia 3080 Founders Edition

from nvidia-smi: Driver Version: 515.43.04 CUDA Version: 11.7

Installed ubuntu 18 from this iso: https://releases.ubuntu.com/18.04/ubuntu-18.04.6-desktop-amd64.iso

After installation I elected not to install any ubuntu updates (I believe on previous attempts at this entire process I did accept all updates, without upgrading to 20, with the same result). I also did attempt these instructions on ubuntu 20 and did not get it to work (although most of my efforts have been on 18)

setup:

sudo apt-get install -y git build-essential gfortran
sudo apt-get install -y libopenblas-dev
sudo apt-get install -y autoconf
mkdir -p ${HOME}/Software

compile METIS

cd ${HOME}/Software
git clone https://github.com/coin-or-tools/ThirdParty-Metis.git
cd ThirdParty-Metis && ./get.Metis
mkdir build
cd build
../configure --prefix=${PWD}
make && make install
export METISDIR=${PWD}

get CUDA on ubuntu 18, from: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=18.04&target_type=deb_local

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-ubuntu1804-11-7-local_11.7.0-515.43.04-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-7-local_11.7.0-515.43.04-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu1804-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

export CUDA_HOME="/usr/local/cuda"
export PATH="${PATH}:${CUDA_HOME}/bin"
export LIBRARY_PATH="${LIBRARY_PATH}:${CUDA_HOME}/lib64"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${CUDA_HOME}/lib64"
export C_INCLUDE_PATH="${CPLUS_INCLUDE_PATH}:${CUDA_HOME}/include"
export CPLUS_INCLUDE_PATH="${CPLUS_INCLUDE_PATH}:${CUDA_HOME}/include"
export NVCC_INCLUDE_FLAGS="${NVCC_INCLUDE_FLAGS}:-I${CUDA_HOME}/include"

Get hwloc... the instructions suggest this

#sudo apt-get install -y hwloc libhwloc-dev

but I was recommended to try compiling from scratch and specifying the cuda version as in https://www.open-mpi.org/projects/hwloc/doc/v2.7.1/a00373.php#faq_cuda_build

cd $HOME/Software
git clone https://github.com/open-mpi/hwloc.git
cd hwloc
./autogen.sh
./cofigure --with-cuda-version=11.7
make 
sudo make install

I found the bug was the same in both cases

Get SPRAL:

cd ${HOME}/Software
#git clone https://github.com/lanl-ansi/spral.git
git clone https://github.com/ralna/spral.git
cd spral
mkdir build
./autogen.sh # If compiling from scratch.

Build/install SPRAL, changed flags from instructions to add debugging info

CFLAGS="-fPIC -g" CPPFLAGS="-fPIC -g" CXXFLAGS="-fPIC -g" FFLAGS="-fPIC -g" \
   FCFLAGS="-fPIC -g" NVCCFLAGS="-shared -Xcompiler -fPIC -g" \
   ./configure --prefix=${PWD}/build \
   --with-blas="-lopenblas" --with-lapack="-llapack" \
   --with-metis="-L${METISDIR}/lib -lcoinmetis" \
   --with-metis-inc-dir="${METISDIR}/include/coin-or/metis"
make && make install

export required variables

cd ${HOME}/Software/spral
export SPRALDIR=${PWD}/build
export OMP_CANCELLATION=TRUE
export OMP_NESTED=TRUE
export OMP_PROC_BIND=TRUE

now install Ipopt, added --enable-debug flag to configure for debugging

cd ${HOME}/Software
git clone https://github.com/coin-or/Ipopt.git
#git clone https://github.com/lanl-ansi/Ipopt.git --branch devel # original ipopt designed to work with spral, using it didn't seem to fix problem
cd ${HOME}/Software/Ipopt
mkdir build
cd build
../configure --enable-debug --prefix=${PWD} --with-spral-lflags="-L${SPRALDIR}/lib -L${METISDIR}/lib \
    -lspral -lgfortran -lhwloc -lm -lcoinmetis -lopenblas -lstdc++ -fopenmp \
    -lcudadevrt -lcudart -lcuda -lcublas" --with-spral-cflags="-I${SPRALDIR}/include" \
    --with-lapack-lflags="-llapack -lopenblas"
make && make install

I have also previously tried an alternative metis/spral installation as in: https://github.com/ralna/spral/issues/31 this also didn't seem to solve the problem.

Test:

cd ${HOME}/Software/Ipopt/build
cd examples/ScalableProblems && make
rm ipopt.opt
touch ipopt.opt
echo "linear_solver spral" >> ipopt.opt
echo "spral_use_gpu no" >> ipopt.opt
time ./solve_problem MBndryCntrl1 180 # this was made smaller for comparison so the gpu version below would fit on my 3080

output looks as expected:

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.9, running with linear solver spral.

Number of nonzeros in equality constraint Jacobian...:   162000
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:    33120

Total number of variables............................:    33120
                     variables with only lower bounds:        0
                variables with lower and upper bounds:      720
                     variables with only upper bounds:    32400
Total number of equality constraints.................:    32400
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  4.9723757e-01 4.00e+00 9.91e-01  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  2.8128756e-01 1.93e+00 9.70e-01  -1.0 3.06e+00    -  3.46e-01 5.19e-01h  1
   2  2.1878089e-01 3.82e-15 5.64e+00  -1.0 2.04e+00    -  7.01e-01 1.00e+00f  1
   3  2.8337223e-01 3.82e-15 3.83e+01  -1.0 7.75e-01    -  2.76e-01 1.00e+00f  1
   4  5.7815995e-01 3.37e-15 1.46e+00  -1.0 6.96e-01    -  3.73e-01 1.00e+00f  1
   5  1.7950106e+00 2.93e-15 1.36e-01  -1.0 1.15e+00    -  5.79e-01 1.00e+00f  1
   6  2.3513997e+00 1.73e-15 8.20e-03  -1.7 3.94e-01    -  9.31e-01 1.00e+00f  1
   7  2.4601751e+00 1.82e-15 2.83e-08  -2.5 2.55e-01    -  1.00e+00 1.00e+00f  1
   8  2.4502118e+00 1.60e-15 6.17e-04  -3.8 9.03e-02    -  9.90e-01 1.00e+00f  1
   9  1.0413120e+00 2.84e-15 1.56e-03  -5.7 2.47e+00    -  1.20e-01 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  4.8766282e-01 3.37e-15 2.49e-04  -5.7 7.60e-01    -  5.39e-01 1.00e+00f  1
  11  2.8149602e-01 3.82e-15 6.23e-06  -5.7 4.75e-01    -  9.11e-01 1.00e+00f  1
  12  2.0674577e-01 3.73e-15 1.84e-11  -5.7 2.38e-01    -  1.00e+00 1.00e+00f  1
  13  2.0067797e-01 3.82e-15 1.10e-04  -8.6 1.07e-01    -  9.51e-01 2.40e-01f  1
  14  2.0030802e-01 3.73e-15 1.74e-03  -8.6 6.77e-02    -  9.72e-01 1.00e+00f  1
  15  2.0030474e-01 3.73e-15 8.15e-07  -8.6 9.51e-04    -  1.00e+00 9.76e-01h  1
  16  2.0030475e-01 3.73e-15 2.51e-14  -8.6 2.37e-05    -  1.00e+00 1.00e+00f  1

Number of Iterations....: 16

                                   (scaled)                 (unscaled)
Objective...............:   2.0030474938899570e-01    2.0030474938899570e-01
Dual infeasibility......:   2.5149011291438622e-14    2.5149011291438622e-14
Constraint violation....:   3.7308480957398693e-15    3.7308480957398693e-15
Variable bound violation:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   2.5059912877204219e-09    2.5059912877204219e-09
Overall NLP error.......:   2.5059912877204219e-09    2.5059912877204219e-09

Number of objective function evaluations             = 17
Number of objective gradient evaluations             = 17
Number of equality constraint evaluations            = 17
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 17
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 16
Total seconds in IPOPT                               = 6.303

EXIT: Optimal Solution Found.

real 0m6.367s
user 0m16.915s
sys 0m0.473s

test gpu version:

rm ipopt.opt
touch ipopt.opt
echo "linear_solver spral" >> ipopt.opt
echo "spral_use_gpu yes" >> ipopt.opt
echo "spral_min_gpu_work 0.1" >> ipopt.opt
echo "spral_gpu_perf_coeff 10.0" >> ipopt.opt
time ./solve_problem MBndryCntrl1 180

results in segfault:

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.9, running with linear solver spral.

Number of nonzeros in equality constraint Jacobian...:   162000
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:    33120

Total number of variables............................:    33120
                     variables with only lower bounds:        0
                variables with lower and upper bounds:      720
                     variables with only upper bounds:    32400
Total number of equality constraints.................:    32400
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  4.9723757e-01 4.00e+00 1.00e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
Segmentation fault (core dumped)

real 0m58.613s
user 4m32.657s
sys 0m2.620s

then debugging:

john@ubuntu18:~/Software/Ipopt/build/examples/ScalableProblems$ gdb --args ./solve_problem MBndryCntrl1 180
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./solve_problem...done.
(gdb) r
Starting program: /home/john/Software/Ipopt/build/examples/ScalableProblems/solve_problem MBndryCntrl1 180
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.9, running with linear solver spral.

Number of nonzeros in equality constraint Jacobian...:   162000
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:    33120

[New Thread 0x7fffc2381700 (LWP 10120)]
[New Thread 0x7fffd2492700 (LWP 10121)]
[New Thread 0x7fffd1c91700 (LWP 10122)]
[New Thread 0x7fffd1290700 (LWP 10123)]
[New Thread 0x7fffd0a8f700 (LWP 10124)]
[New Thread 0x7fffd028e700 (LWP 10125)]
[New Thread 0x7fffc1b80700 (LWP 10126)]
[New Thread 0x7fffc137f700 (LWP 10127)]
[New Thread 0x7fffc0b7e700 (LWP 10128)]
[Thread 0x7fffc0b7e700 (LWP 10128) exited]
[Thread 0x7fffc137f700 (LWP 10127) exited]
[Thread 0x7fffc1b80700 (LWP 10126) exited]
[Thread 0x7fffd028e700 (LWP 10125) exited]
[Thread 0x7fffd0a8f700 (LWP 10124) exited]
[New Thread 0x7fffc1b80700 (LWP 10129)]
[New Thread 0x7fffd0a8f700 (LWP 10130)]
[New Thread 0x7fffc137f700 (LWP 10131)]
[New Thread 0x7fffc0b7e700 (LWP 10132)]
Total number of variables............................:    33120
                     variables with only lower bounds:        0
                variables with lower and upper bounds:      720
                     variables with only upper bounds:    32400
Total number of equality constraints.................:    32400
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  4.9723757e-01 4.00e+00 1.00e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
[New Thread 0x7fffd028e700 (LWP 10135)]
[New Thread 0x7fff59fff700 (LWP 10136)]
[Thread 0x7fffc0b7e700 (LWP 10132) exited]
[Thread 0x7fffc137f700 (LWP 10131) exited]
[Thread 0x7fffd0a8f700 (LWP 10130) exited]
[Thread 0x7fffc1b80700 (LWP 10129) exited]
[New Thread 0x7fffd0a8f700 (LWP 10137)]
[New Thread 0x7fffc137f700 (LWP 10138)]
[New Thread 0x7fffc0b7e700 (LWP 10139)]
[New Thread 0x7fffc1b80700 (LWP 10140)]
[Thread 0x7fffd1290700 (LWP 10123) exited]
[Thread 0x7fffc1b80700 (LWP 10140) exited]
[Thread 0x7fffc0b7e700 (LWP 10139) exited]
[Thread 0x7fffc137f700 (LWP 10138) exited]
[Thread 0x7fffd0a8f700 (LWP 10137) exited]
[Thread 0x7fff59fff700 (LWP 10136) exited]

Thread 15 "solve_problem" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd028e700 (LWP 10135)]
0x00007ffff7977601 in spral_ssids_gpu_dense_factor::node_ldlt (stream=0x7fff60033150, nrows=2, ncols=1, gpu_l=0x7ffefe7ef000, gpu_ld=0x7ffefe7ef200, ldl=2, gpu_d=0x7ffefe7ef010, gpu_b=0x7ffefe7ee500,
    gpu_ind=0x7ffefe7ee400, delta=0.01, eps=0, block_size=8, perm=..., ind=..., done=0, gwork=..., cublas_handle=0x7fff60441600, cuda_error=0, cublas_error=0) at src/ssids/gpu/dense_factor.f90:287
287             ind(ncols + j) = perm(done + i)

Backtrace shows it's in ssids?

(gdb) bt
#0  0x00007ffff7977601 in spral_ssids_gpu_dense_factor::node_ldlt (stream=0x7fff60033150, nrows=2, ncols=1, gpu_l=0x7ffefe7ef000, gpu_ld=0x7ffefe7ef200, ldl=2, gpu_d=0x7ffefe7ef010, gpu_b=0x7ffefe7ee500,
    gpu_ind=0x7ffefe7ee400, delta=0.01, eps=0, block_size=8, perm=..., ind=..., done=0, gwork=..., cublas_handle=0x7fff60441600, cuda_error=0, cublas_error=0) at src/ssids/gpu/dense_factor.f90:287
#1  0x00007ffff7956d51 in spral_ssids_gpu_factor::factor_indef (stream=0x7fff60033150, lev=1, lvlptr=..., nnodes=1, nodes=..., lvllist=..., sparent=..., sptr=..., rptr=..., level_height=2, level_width=1,
    delta=0.01, eps=0, gpu_ldcol=..., gwork=..., cublas_handle=0x7fff60441600, options=..., stats=..., gpu_custats=0x7ffefe7a3800) at src/ssids/gpu/factor.f90:1420
#2  0x00007ffff795cc60 in spral_ssids_gpu_factor::subtree_factor_gpu (stream=0x7fff60033150, pos_def=.FALSE., child_ptr=..., child_list=..., n=65520, nptr=..., gpu_nlist=0x7ffdf25f9800,
    ptr_val=0x7fff9fc9e000, nnodes=1, nodes=..., sptr=..., sparent=..., rptr=..., rlist_direct=..., gpu_rlist=0x7ffdf25f9a00, gpu_rlist_direct=0x7ffdf25f9c00, gpu_contribs=..., gpu_ldlt=0x7ffefe7ee200,
    gpu=..., alloc=..., options=..., stats=..., ptr_scale=0x7ffdf2732400) at src/ssids/gpu/factor.f90:555
#3  0x00007ffff795e610 in spral_ssids_gpu_factor::parfactor (pos_def=.FALSE., child_ptr=..., child_list=..., n=65520, nptr=..., gpu_nlist=0x7ffdf25f9800, ptr_val=0x7fff9fc9e000, nnodes=1,
    nodes=<error reading variable: value requires 2643152384 bytes, which is more than max-value-size>, sptr=..., sparent=..., rptr=..., rlist=..., rlist_direct=..., gpu_rlist=0x7ffdf25f9a00,
    gpu_rlist_direct=0x7ffdf25f9c00, gpu_contribs=..., stream_handle=0x7fff60033150, stream_data=..., gpu_rlist_with_delays=0x0, gpu_rlist_direct_with_delays=0x0, gpu_clists=0x0, gpu_clists_direct=0x0,
    gpu_clen=0x0, contrib=..., contrib_wait=0x7fff600232e0, alloc=..., options=..., stats=..., ptr_scale=0x7ffdf2732400) at src/ssids/gpu/factor.f90:109
#4  0x00007ffff791f87e in spral_ssids_gpu_subtree::factor (this=..., posdef=.FALSE., aval=..., child_contrib=..., options=..., inform=..., scaling=...) at src/ssids/gpu/subtree.f90:407
#5  0x00007ffff7909e11 in __spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.2 () at src/ssids/fkeep.F90:150
#6  0x00007fffe96ab8e1 in GOMP_task () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#7  0x00007ffff79095e3 in __spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.1 () at src/ssids/fkeep.F90:170
#8  0x00007fffe96a6edf in GOMP_parallel () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#9  0x00007ffff790937d in __spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.0 () at src/ssids/fkeep.F90:134
#10 0x00007fffe96af96e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#11 0x00007fffe8e2c6db in start_thread (arg=0x7fffd028e700) at pthread_create.c:463
#12 0x00007ffff699d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I have also seen the segfault occur in a BuddyAllocator in other runs:

john@ubuntu18:~/Software/Ipopt/build/examples/ScalableProblems$ gdb --args ./solve_problem MBndryCntrl1 180
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./solve_problem...done.
(gdb) r
Starting program: /home/john/Software/Ipopt/build/examples/ScalableProblems/solve_problem MBndryCntrl1 180
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.9, running with linear solver spral.

Number of nonzeros in equality constraint Jacobian...:   162000
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:    33120

[New Thread 0x7fffc2381700 (LWP 5480)]
[New Thread 0x7fffd2492700 (LWP 5481)]
[New Thread 0x7fffd1c91700 (LWP 5482)]
[New Thread 0x7fffd1290700 (LWP 5483)]
[New Thread 0x7fffd0a8f700 (LWP 5484)]
[New Thread 0x7fffd028e700 (LWP 5485)]
[New Thread 0x7fffc1b80700 (LWP 5486)]
[New Thread 0x7fffc137f700 (LWP 5487)]
[New Thread 0x7fffc0b7e700 (LWP 5488)]
[Thread 0x7fffc0b7e700 (LWP 5488) exited]
[Thread 0x7fffc137f700 (LWP 5487) exited]
[Thread 0x7fffc1b80700 (LWP 5486) exited]
[Thread 0x7fffd028e700 (LWP 5485) exited]
[Thread 0x7fffd0a8f700 (LWP 5484) exited]
[New Thread 0x7fffc0b7e700 (LWP 5489)]
[New Thread 0x7fffc137f700 (LWP 5490)]
[New Thread 0x7fffd0a8f700 (LWP 5491)]
[New Thread 0x7fffc1b80700 (LWP 5492)]

Total number of variables............................:    33120
                     variables with only lower bounds:        0
                variables with lower and upper bounds:      720
                     variables with only upper bounds:    32400
Total number of equality constraints.................:    32400
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  4.9723757e-01 4.00e+00 1.00e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
[New Thread 0x7fffd028e700 (LWP 5532)]
[New Thread 0x7fff65fff700 (LWP 5533)]
[Thread 0x7fffc1b80700 (LWP 5492) exited]
[New Thread 0x7fffc1b80700 (LWP 5534)]
[New Thread 0x7fff657fe700 (LWP 5535)]
[Thread 0x7fffd0a8f700 (LWP 5491) exited]
[Thread 0x7fffc137f700 (LWP 5490) exited]
[Thread 0x7fffc0b7e700 (LWP 5489) exited]
[New Thread 0x7fffc137f700 (LWP 5536)]
[New Thread 0x7fffd0a8f700 (LWP 5537)]
[Thread 0x7fffd1290700 (LWP 5483) exited]
[Thread 0x7fffd0a8f700 (LWP 5537) exited]
[Thread 0x7fffc137f700 (LWP 5536) exited]
[Thread 0x7fff657fe700 (LWP 5535) exited]
[Thread 0x7fffc1b80700 (LWP 5534) exited]
[Thread 0x7fff65fff700 (LWP 5533) exited]
In SpralSolverInterface::Factorization: Unhandled error. info.flag = -50.
Set spral_print_level=0 to see more details.
WARNING: Problem in step computation; switching to emergency mode.
   1r 4.9723757e-01 4.00e+00 9.99e+02   0.6 0.00e+00    -  0.00e+00 0.00e+00R  1
terminate called after throwing an instance of 'std::runtime_error'
  what():  outstanding allocations on cleanup

Thread 1 "solve_problem" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff68bc7f1 in __GI_abort () at abort.c:79
#2  0x00007ffff72af957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff72b5ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff72b4b49 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff72b54b8 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff6c7d573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007ffff6c7ddf5 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007ffff7932514 in spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >::~Page (this=0x7fff6bace230, __in_chrg=<optimized out>) at ./src/ssids/cpu/BuddyAllocator.hxx:120
#9  0x00007ffff79322d9 in std::_Destroy<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> > > (__pointer=0x7fff6bace230) at /usr/include/c++/7/bits/stl_construct.h:98
#10 0x00007ffff7931962 in std::_Destroy_aux<false>::__destroy<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >*> (__first=0x7fff6bace230, __last=0x7fff6bace310)
    at /usr/include/c++/7/bits/stl_construct.h:108
#11 0x00007ffff7930f38 in std::_Destroy<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >*> (__first=0x7fff6bace1c0, __last=0x7fff6bace310) at /usr/include/c++/7/bits/stl_construct.h:137
#12 0x00007ffff79300a7 in std::_Destroy<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >*, spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> > > (__first=0x7fff6bace1c0,
    __last=0x7fff6bace310) at /usr/include/c++/7/bits/stl_construct.h:206
#13 0x00007ffff792e7d7 in std::vector<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> >, std::allocator<spral::ssids::cpu::buddy_alloc_internal::Page<std::allocator<char> > > >::~vector (
    this=0x7fff68f749a0, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/stl_vector.h:434
#14 0x00007ffff7931258 in spral::ssids::cpu::buddy_alloc_internal::Table<std::allocator<char> >::~Table (this=0x7fff68f74990, __in_chrg=<optimized out>) at ./src/ssids/cpu/BuddyAllocator.hxx:286
#15 0x00007ffff7932aec in std::_Sp_counted_ptr<spral::ssids::cpu::buddy_alloc_internal::Table<std::allocator<char> >*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7fff692c7530)
    at /usr/include/c++/7/bits/shared_ptr_base.h:376
#16 0x00007ffff790f632 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff692c7530) at /usr/include/c++/7/bits/shared_ptr_base.h:154
#17 0x00007ffff790eeef in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff68000b80, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684
#18 0x00007ffff7924818 in std::__shared_ptr<spral::ssids::cpu::buddy_alloc_internal::Table<std::allocator<char> >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fff68000b78, __in_chrg=<optimized out>)
    at /usr/include/c++/7/bits/shared_ptr_base.h:1123
#19 0x00007ffff7924834 in std::shared_ptr<spral::ssids::cpu::buddy_alloc_internal::Table<std::allocator<char> > >::~shared_ptr (this=0x7fff68000b78, __in_chrg=<optimized out>)
    at /usr/include/c++/7/bits/shared_ptr.h:93
#20 0x00007ffff7924850 in spral::ssids::cpu::BuddyAllocator<double, std::allocator<double> >::~BuddyAllocator (this=0x7fff68000b78, __in_chrg=<optimized out>) at ./src/ssids/cpu/BuddyAllocator.hxx:382
#21 0x00007ffff79262bd in spral::ssids::cpu::NumericSubtree<false, double, 8388608ul, spral::ssids::cpu::AppendAlloc<double> >::~NumericSubtree (this=0x7fff68000b60, __in_chrg=<optimized out>)
    at ./src/ssids/cpu/NumericSubtree.hxx:278
#22 0x00007ffff7922c64 in spral_ssids_cpu_destroy_num_subtree_dbl (posdef=false, target=0x7fff68000b60) at src/ssids/cpu/NumericSubtree.cxx:70
#23 0x00007ffff790d3de in spral_ssids_cpu_subtree::numeric_cleanup (this=...) at src/ssids/cpu/subtree.f90:298
#24 0x00007ffff78c78bd in spral_ssids::ssids_factor_ptr64_double (posdef=.FALSE., val=..., akeep=<error reading variable: value requires 3640576 bytes, which is more than max-value-size>,
    fkeep=<error reading variable: value requires 524160 bytes, which is more than max-value-size>, options=..., inform=...,
    scale=<error reading variable: value requires 524160 bytes, which is more than max-value-size>, ptr=..., row=...) at src/ssids/ssids.f90:1033
#25 0x00007ffff78c8700 in spral_ssids::ssids_factor_ptr32_double (posdef=.FALSE., val=..., akeep=<error reading variable: value requires 3640576 bytes, which is more than max-value-size>,
    fkeep=<error reading variable: value requires 524160 bytes, which is more than max-value-size>, options=..., inform=...,
    scale=<error reading variable: value requires 524160 bytes, which is more than max-value-size>, ptr=..., row=...) at src/ssids/ssids.f90:757
#26 0x00007ffff78c175e in spral_ssids_factor_ptr32 (cposdef=.FALSE., cptr=<error reading variable: Attempt to dereference a generic pointer.>,
    crow=<error reading variable: Attempt to dereference a generic pointer.>, val=..., cscale=<error reading variable: Attempt to dereference a generic pointer.>,
    cakeep=<error reading variable: Attempt to dereference a generic pointer.>, cfkeep=0x555556294e10, coptions=..., cinform=...) at interfaces/C/ssids.f90:572
#27 0x00007ffff78b99bb in Ipopt::SpralSolverInterface::MultiSolve (this=0x555555a3be20, new_matrix=true, ia=0x5555562a34b0, ja=0x5555565dee30, nrhs=1, rhs_vals=0x55555ac15a00, check_NegEVals=true,
    numberOfNegEVals=32400) at ../../src/Algorithm/LinearSolvers/IpSpralSolverInterface.cpp:578
#28 0x00007ffff78492fe in Ipopt::TSymLinearSolver::MultiSolve (this=0x555555a685a0, sym_A=..., rhsV=std::vector of length 1, capacity 1 = {...}, solV=std::vector of length 1, capacity 1 = {...},
    check_NegEVals=true, numberOfNegEVals=32400) at ../../src/Algorithm/LinearSolvers/IpTSymLinearSolver.cpp:262
#29 0x00007ffff7838991 in Ipopt::StdAugSystemSolver::MultiSolve (this=0x55555598bb10, W=0x555557965420, W_factor=1, D_x=0x555558be2240, delta_x=0, D_s=0x555556778b20, delta_s=0, J_c=0x555557966300,
    D_c=0x555558be2490, delta_c=0, J_d=0x555555ad0020, D_d=0x555558be26d0, delta_d=0, rhs_xV=std::vector of length 1, capacity 1 = {...}, rhs_sV=std::vector of length 1, capacity 1 = {...},
    rhs_cV=std::vector of length 1, capacity 1 = {...}, rhs_dV=std::vector of length 1, capacity 1 = {...}, sol_xV=std::vector of length 1, capacity 1 = {...},
    sol_sV=std::vector of length 1, capacity 1 = {...}, sol_cV=std::vector of length 1, capacity 1 = {...}, sol_dV=std::vector of length 1, capacity 1 = {...}, check_NegEVals=true, numberOfNegEVals=32400)
    at ../../src/Algorithm/IpStdAugSystemSolver.cpp:210
#30 0x00007ffff777cc5f in Ipopt::AugSystemSolver::Solve (this=0x55555598bb10, W=0x555557965420, W_factor=1, D_x=0x555558be2240, delta_x=0, D_s=0x555556778b20, delta_s=0, J_c=0x555557966300,
    D_c=0x555558be2490, delta_c=0, J_d=0x555555ad0020, D_d=0x555558be26d0, delta_d=0, rhs_x=..., rhs_s=..., rhs_c=..., rhs_d=..., sol_x=..., sol_s=..., sol_c=..., sol_d=..., check_NegEVals=true,
    numberOfNegEVals=32400) at ../../src/Algorithm/IpAugSystemSolver.hpp:105
#31 0x00007ffff7779a0e in Ipopt::AugRestoSystemSolver::Solve (this=0x555555a35690, W=0x555557950010, W_factor=1, D_x=0x555556778260, delta_x=0, D_s=0x5555567789e0, delta_s=0, J_c=0x555556832b10, D_c=0x0,
    delta_c=0, J_d=0x555556292f80, D_d=0x0, delta_d=0, rhs_x=..., rhs_s=..., rhs_c=..., rhs_d=..., sol_x=..., sol_s=..., sol_c=..., sol_d=..., check_NegEVals=true, numberOfNegEVals=32400)
    at ../../src/Algorithm/IpAugRestoSystemSolver.cpp:262
#32 0x00007ffff7809fd3 in Ipopt::PDFullSpaceSolver::SolveOnce (this=0x555555a688e0, resolve_with_better_quality=false, pretend_singular=false, W=..., J_c=..., J_d=..., Px_L=..., Px_U=..., Pd_L=...,
    Pd_U=..., z_L=..., z_U=..., v_L=..., v_U=..., slack_x_L=..., slack_x_U=..., slack_s_L=..., slack_s_U=..., sigma_x=..., sigma_s=..., alpha=1, beta=0, rhs=..., res=...)
    at ../../src/Algorithm/IpPDFullSpaceSolver.cpp:520
#33 0x00007ffff7807ec4 in Ipopt::PDFullSpaceSolver::Solve (this=0x555555a688e0, alpha=-1, beta=0, rhs=..., res=..., allow_inexact=false, improve_solution=false)
    at ../../src/Algorithm/IpPDFullSpaceSolver.cpp:214
#34 0x00007ffff7812b5e in Ipopt::PDSearchDirCalculator::ComputeSearchDirection (this=0x555555a68be0) at ../../src/Algorithm/IpPDSearchDirCalc.cpp:132
#35 0x00007ffff77a5b65 in Ipopt::IpoptAlgorithm::ComputeSearchDirection (this=0x555555a68c30) at ../../src/Algorithm/IpIpoptAlg.cpp:618
#36 0x00007ffff77a4b1d in Ipopt::IpoptAlgorithm::Optimize (this=0x555555a68c30, isResto=true) at ../../src/Algorithm/IpIpoptAlg.cpp:373
#37 0x00007ffff7832890 in Ipopt::MinC_1NrmRestorationPhase::PerformRestoration (this=0x555555a68d00) at ../../src/Algorithm/IpRestoMinC_1Nrm.cpp:192
#38 0x00007ffff7783982 in Ipopt::BacktrackingLineSearch::FindAcceptableTrialPoint (this=0x555555a68ee0) at ../../src/Algorithm/IpBacktrackingLineSearch.cpp:600
#39 0x00007ffff77a5e4c in Ipopt::IpoptAlgorithm::ComputeAcceptableTrialPoint (this=0x555555a69270) at ../../src/Algorithm/IpIpoptAlg.cpp:643
#40 0x00007ffff77a4cee in Ipopt::IpoptAlgorithm::Optimize (this=0x555555a69270, isResto=false) at ../../src/Algorithm/IpIpoptAlg.cpp:397
#41 0x00007ffff78647fb in Ipopt::IpoptApplication::call_optimize (this=0x555555a2a6d0) at ../../src/Interfaces/IpIpoptApplication.cpp:651
#42 0x00007ffff78637a6 in Ipopt::IpoptApplication::OptimizeNLP (this=0x555555a2a6d0, nlp=..., alg_builder=...) at ../../src/Interfaces/IpIpoptApplication.cpp:530
#43 0x00007ffff7863460 in Ipopt::IpoptApplication::OptimizeNLP (this=0x555555a2a6d0, nlp=...) at ../../src/Interfaces/IpIpoptApplication.cpp:486
#44 0x00007ffff7863059 in Ipopt::IpoptApplication::OptimizeTNLP (this=0x555555a2a6d0, tnlp=...) at ../../src/Interfaces/IpIpoptApplication.cpp:466
#45 0x0000555555560f93 in main (argv=3, argc=0x7fffffffde08) at ../../../examples/ScalableProblems/solve_problem.cpp:205
(gdb)

Also running nvidia-smi in another console at the same time as the gpu version of the problem (note machine name changed as I did a full reinstall per the instruction I listed above to make sure I hadn't corrupted anything, saw same behavior):

john@ubuntuSpral:~$ nvidia-smi
Wed Jul 27 17:43:05 2022      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
|  0%   42C    P8    19W / 320W |   5796MiB / 10240MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     15502      C   ./solve_problem                  5793MiB |
+-----------------------------------------------------------------------------+
john@ubuntuSpral:~$ nvidia-smi
Wed Jul 27 17:43:06 2022      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
|  0%   45C    P2    40W / 320W |   4562MiB / 10240MiB |     76%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     15502      C   ./solve_problem                  4559MiB |
+-----------------------------------------------------------------------------+
john@ubuntuSpral:~$ nvidia-smi
Wed Jul 27 17:43:07 2022      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
|  0%   45C    P2   108W / 320W |   4562MiB / 10240MiB |     23%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
john@ubuntuSpral:~$ nvidia-smi
Wed Jul 27 17:43:08 2022      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
|  0%   45C    P0   109W / 320W |      1MiB / 10240MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

When digging further, running spral's built in check:

john@ubuntuSpral:~/Software/spral$ make check
make  lsmr_test examples/Fortran/lsmr examples/C/lsmr rutherford_boeing_test examples/Fortran/rutherford_boeing/rb_read examples/Fortran/rutherford_boeing/rb_write examples/C/rutherford_boeing/rb_read examples/C/rutherford_boeing/rb_write scaling_test examples/Fortran/scaling/auction_sym examples/Fortran/scaling/auction_unsym examples/Fortran/scaling/equilib_sym examples/Fortran/scaling/equilib_unsym examples/Fortran/scaling/hungarian_sym examples/Fortran/scaling/hungarian_unsym examples/C/scaling/auction_sym examples/C/scaling/auction_unsym examples/C/scaling/equilib_sym examples/C/scaling/equilib_unsym examples/C/scaling/hungarian_sym examples/C/scaling/hungarian_unsym random_test examples/Fortran/random examples/C/random random_matrix_test examples/Fortran/random_matrix examples/C/random_matrix ssids_test ssids_kernel_test examples/Fortran/ssids examples/C/ssids ssmfe_test ssmfe_ciface_test examples/C/ssmfe/hermitian examples/C/ssmfe/precond_core examples/C/ssmfe/precond_expert examples/C/ssmfe/precond_ssmfe examples/C/ssmfe/shift_invert examples/Fortran/ssmfe/hermitian examples/Fortran/ssmfe/precond_core examples/Fortran/ssmfe/precond_expert examples/Fortran/ssmfe/precond_ssmfe examples/Fortran/ssmfe/shift_invert
make[1]: Entering directory '/home/john/Software/spral'
gfortran -fopenmp  -fPIC -g -fno-optimize-sibling-calls -c -o tests/lsmr.o tests/lsmr.f90
...
...
... lots of gcc, gfortran, and compilation warning malarkey
...
...
gfortran -fopenmp  -fPIC -g -fno-optimize-sibling-calls -c -o examples/Fortran/ssmfe/shift_invert.o examples/Fortran/ssmfe/shift_invert.f90
gfortran -fopenmp  -fPIC -g -fno-optimize-sibling-calls   -o examples/Fortran/ssmfe/shift_invert examples/Fortran/ssmfe/shift_invert.o examples/Fortran/ssmfe/laplace2d.o examples/Fortran/ssmfe/ldltf.o -L. -lspral -L/home/john/Software/ThirdParty-Metis/build/lib -lcoinmetis -llapack -lopenblas  -L/usr/local/cuda/lib64 -lhwloc -lm -lnuma -lltdl -lpthread -ldl -L/usr/local/cuda/lib64/../lib -L/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L. -L/usr/local/cuda/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../.. -lgfortran -lm -lquadmath
make[1]: Leaving directory '/home/john/Software/spral'
make  check-TESTS
make[1]: Entering directory '/home/john/Software/spral'
make[2]: Entering directory '/home/john/Software/spral'
PASS: lsmr_test
PASS: rutherford_boeing_test
PASS: scaling_test
PASS: random_test
PASS: random_matrix_test
./test-driver: line 107: 16591 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: ssids_test
PASS: ssids_kernel_test
PASS: ssmfe_test
PASS: ssmfe_ciface_test
============================================================================
Testsuite summary for spral 2021.09.01-dev
============================================================================
# TOTAL: 9
# PASS:  8
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See ./test-suite.log
Please report to [hsl@stfc.ac.uk](mailto:hsl@stfc.ac.uk)
============================================================================
Makefile:2564: recipe for target 'test-suite.log' failed
make[2]: *** [test-suite.log] Error 1
make[2]: Leaving directory '/home/john/Software/spral'
Makefile:2670: recipe for target 'check-TESTS' failed
make[1]: *** [check-TESTS] Error 2
make[1]: Leaving directory '/home/john/Software/spral'
Makefile:2932: recipe for target 'check-am' failed
make: *** [check-am] Error 2

The contents of test-suite.log:

more ./test-suite.log
============================================
   spral 2021.09.01-dev: ./test-suite.log
============================================

# TOTAL: 9
# PASS:  8
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: ssids_test
================

FAIL ssids_test (exit status: 139)
jfowkes commented 2 years ago

Thank you very much for your comprehensive bug report.

It is likely that this is a hardware compatibility issue as you have a relatively new GPU. One thing I would definitely try is to edit Makefile.am to use the correct gencode for your GPU. You have an RTX 3080 and you're using CUDA 11.7 so you should have something like:

AM_NVCC_FLAGS += -gencode arch=compute_80,code=sm_80
AM_NVCC_FLAGS += -gencode arch=compute_86,code=sm_86
AM_NVCC_FLAGS += -gencode arch=compute_87,code=sm_87

Could you please remove any other gencodes and try to recompile and re-run the tests?

You can read more about gencodes here (essentially GPU architecture flags): https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

The GPU version of SPRAL was developed and tested on an old Nvidia Tesla K40c GPU. While I wouldn't recommend using such old hardware, it did successfully pass all the make check tests when I tested it last year, but this was a regular build not using IPOPT. Unfortunately we have since lost access to this GPU (scrapped for being too old) so I am unable to verify whether an IPOPT build would also pass all the make check tests on it.