mdolab / adflow

ADflow is a finite volume RANS solver tailored for gradient-based aerodynamic design optimization.
Other
229 stars 100 forks source link

Complex regression tests fail when ADflow built with new intel compilers #357

Open A-CGray opened 6 months ago

A-CGray commented 6 months ago

Description

A handful of the complex ADflow regression tests are failing on the latest docker PR that uses the new intel ifx and mpiifx compilers. Most likely we need to re-train the tests.

Current behavior

 /home/***/repos/adflow/tests/reg_tests/test_adjoint.py:TestCmplxStep_2_laminar_tut_wing.cmplx_test_aero_dvs  ... FAIL (00:00:54.57, 1171 MB)
Traceback (most recent call last):
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 392, in multi_proc_exception_check
    yield
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 199, in root_add_dict
    self._add_dict(name, d, name, **kwargs)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 365, in _add_dict
    self._add_dict(key, d[key], full_name, rtol=rtol, atol=atol, db=db[dict_name])
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 367, in _add_dict
    self._add_values(key, d[key], rtol=rtol, atol=atol, db=db[dict_name], full_name=full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 321, in _add_values
    self.assert_allclose(values, db[name], name, rtol, atol, full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 270, in assert_allclose
    np.testing.assert_allclose(actual, reference, rtol=rtol, atol=atol, err_msg=msg)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-08, atol=5e-10
Failed value for: Eval Functions Sens:: mdo_tutorial_cd: mdo_tutorial_cl: mdo_tutorial_cmz: mach_mdo_tutorial
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 1.13985595e-09
Max relative difference: 1.96705549e-08
 x: array(0.057947)
 y: array(0.057947)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/***/repos/adflow/tests/reg_tests/test_adjoint.py", line 372, in cmplx_test_aero_dvs
    self.handler.root_add_dict("Eval Functions Sens:", funcsSens, rtol=rtol, atol=atol)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 197, in root_add_dict
    with multi_proc_exception_check(self.comm):
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 409, in multi_proc_exception_check
    raise exc[0](msg).with_traceback(exc[2])
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 392, in multi_proc_exception_check
    yield
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 199, in root_add_dict
    self._add_dict(name, d, name, **kwargs)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 365, in _add_dict
    self._add_dict(key, d[key], full_name, rtol=rtol, atol=atol, db=db[dict_name])
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 367, in _add_dict
    self._add_values(key, d[key], rtol=rtol, atol=atol, db=db[dict_name], full_name=full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 321, in _add_values
    self.assert_allclose(values, db[name], name, rtol, atol, full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 270, in assert_allclose
    np.testing.assert_allclose(actual, reference, rtol=rtol, atol=atol, err_msg=msg)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
    raise AssertionError(msg)
AssertionError: Exception raised on rank 0: 
Not equal to tolerance rtol=1e-08, atol=5e-10
Failed value for: Eval Functions Sens:: mdo_tutorial_cd: mdo_tutorial_cl: mdo_tutorial_cmz: mach_mdo_tutorial
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 1.13985595e-09
Max relative difference: 1.96705549e-08
 x: array(0.057947)
 y: array(0.057947)

Expected behavior

Tests should pass

Code versions

A-CGray commented 1 month ago

When trying to run the adflow tests on the public:u22-intel-impi-latest-amd64 image from https://github.com/mdolab/docker/pull/266 on my machine I get the following errors on many of the tests, any idea what's going on here @eirikurj ?

(mpi) ./tests/reg_tests/test_functionals.py:TestFunctionals_2_euler_matrix_jst_tut_wing.test_forces_and_tractions  ... FAIL (00:00:0.00, 0 MB)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 3826 RUNNING AT 9dee14de4df5
=   KILLED BY SIGNAL: 7 (Bus error)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 3827 RUNNING AT 9dee14de4df5
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

(mpi) ./tests/reg_tests/test_functionals.py:TestFunctionals_2_euler_matrix_jst_tut_wing.test_functions  ... FAIL (00:00:0.00, 0 MB)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 3837 RUNNING AT 9dee14de4df5
=   KILLED BY SIGNAL: 7 (Bus error)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 3838 RUNNING AT 9dee14de4df5
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================
eirikurj commented 1 month ago

You probably need to increase the shared memory size. You can add a flag when starting the container, docker run --shm-size=XX. The default is 65MB, but you can increase it significantly, e.g., for 2GB add --shm-size=2G. This is probably too big in general (something like O(100) MB, e.g., 256MB is probably sufficient), but should be fine since we should have plenty of RAM and not too many containers running, but you can experiment. If you dont want to bother with per-container settings, then you can add the following to /etc/docker/daemon.json

{
    "default-shm-size": "2G"
}

but you might want to keep this smaller then. See if this resolves your immediate problem.