When running a coupled aerostructural derivative check with DAFoam and MELD through MPhys, we noticed that identical runs of the same script / case led to different analytic derivatives (ignore the FD check, that is due to generally poor, but consistent convergence from DAFoam):
Run 1:
Full Model: 'maneuver.struct_post.eval_funcs.ks_vmfailure' wrt 'dvs.sweep_wing'
Reverse Magnitude: 1.176269e-03
Fd Magnitude: 6.548034e-04 (fd:forward)
Run 2:
Full Model: 'maneuver.struct_post.eval_funcs.ks_vmfailure' wrt 'dvs.sweep_wing'
Reverse Magnitude: 1.527612e-03
Fd Magnitude: 6.548034e-04 (fd:forward)
We were not sure where this came from and tried to check the aero solver, load and displacement transfer, and structural solver. In the primal solve, we found slight differences between the results only after the first pass of the structural solution. We started printing out the norm of the flattened values passing through MELD. The first pass through the load transfer is consistent between runs:
Run 1:
MELD Primal Load Transfer x_struct0 24702.433470988755
MELD Primal Load Transfer x_aero0 23658.198184194116
MELD Primal Load Transfer u_struct 0.0
MELD Primal Load Transfer f_aero 8100.573476032763
MELD Primal Load Transfer f_struct 1700.082296927264
Run 2:
MELD Primal Load Transfer x_struct0 24702.433470988755
MELD Primal Load Transfer x_aero0 23658.198184194116
MELD Primal Load Transfer u_struct 0.0
MELD Primal Load Transfer f_aero 8100.573476032763
MELD Primal Load Transfer f_struct 1700.082296927264
But, the pass back through the displacement transfer receives a different vector of displacements, leading to a slightly different converged aerostructural solution:
Run 1:
MELD Primal Disp Transfer x_struct0 24702.433470988755
MELD Primal Disp Transfer x_aero0 23658.198184194116
MELD Primal Disp Transfer u_struct 47.00897156793725
MELD Primal Disp Transfer u_aero 33.24244380757927
NL: NLBGS 5 ; 219.13182 1.42895947e-06
Run 2:
MELD Primal Disp Transfer x_struct0 24702.433470988755
MELD Primal Disp Transfer x_aero0 23658.198184194116
MELD Primal Disp Transfer u_struct 47.00897157165467
MELD Primal Disp Transfer u_aero 33.24244381025362
NL: NLBGS 5 ; 219.131819 1.42895946e-06
We then started looking at simpler cases and found that TACS-only primal solves are deterministic up to 8 digits, especially when running in parallel. @A-CGray and I reproduced this on his computer as well, and found that typically 3+ procs were problematic, but serial cases run with python <scripy>.py were also nondeterministic on one day, but fine the next.
I tried out several of the examples to see which were deterministic, and @A-CGray scripted this on his computer as well using the following:
randVec = problem.assembler.createVec()
randVec.setRand()
for _ in range(20):
problem.setVariables(randVec)
# Solve state
problem.getResidual(problem.res)
initResNorm = problem.res.norm()
problem.solve()
uNorm = problem.u.norm()
if comm.rank==0:
print(f"{problem.name:>10s}: {uNorm=:.16e}, {initResNorm=:.16e}")
The test was to run the solver repeatedly with the same starting point and see if it was consistent. This led to the following results:
The initial residual is consistent, but the norm of the solution is not. We tried many of the examples and found that:
Deterministic:
Spring mass
Beam
Annulus (floating point exception with 16 procs)
Non-deterministic when run on >3-4 cores:
CRM
Fuselage
MACH tutorial wing
Plate
Slot_3d
Out of curiosity, I tried the CRM example with Valgrind and found that with 16 procs there were two invalid reads. Fewer proc counts did not have this issue, but Valgrind might also miss invalid reads depending on if the memory is allocated or not.
==796863== Invalid read of size 4
==796863== at 0x6099B8B9: TACSBlockCyclicMat::get_block_owner(int, int) const (TACSBlockCyclicMat.h:142)
==796863== by 0x60992954: TACSBlockCyclicMat::merge_nz_pattern(int, int*, int*, int) (TACSBlockCyclicMat.cpp:554)
==796863== by 0x60990A07: TACSBlockCyclicMat::TACSBlockCyclicMat(ompi_communicator_t*, int, int, int, int const*, int, int const*, int const*, int, int, int) (TACSBlockCyclicMat.cpp:177)
==796863== by 0x609ABE04: TACSSchurPc::TACSSchurPc(TACSSchurMat*, int, double, int) (TACSSchurMat.cpp:1096)
==796863== by 0x6077EA3D: __pyx_pf_4tacs_4TACS_2Pc___cinit__ (TACS.cpp:34395)
==796863== by 0x6077EA3D: __pyx_pw_4tacs_4TACS_2Pc_1__cinit__ (TACS.cpp:34155)
==796863== by 0x6077EA3D: __pyx_tp_new_4tacs_4TACS_Pc(_typeobject*, _object*, _object*) (TACS.cpp:89770)
==796863== by 0x2589C6: _PyObject_MakeTpCall (in /usr/bin/python3.10)
==796863== by 0x25214F: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x2629FB: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==796863== by 0x24B45B: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x257C13: _PyObject_FastCallDictTstate (in /usr/bin/python3.10)
==796863== by 0x26CB04: ??? (in /usr/bin/python3.10)
==796863== by 0x258A1B: _PyObject_MakeTpCall (in /usr/bin/python3.10)
==796863== Address 0xc66f0b6c is 4 bytes before a block of size 60 alloc’d
==796863== at 0x484A2F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==796863== by 0x6099385F: TACSBlockCyclicMat::init_proc_grid(int) (TACSBlockCyclicMat.cpp:777)
==796863== by 0x609902AA: TACSBlockCyclicMat::TACSBlockCyclicMat(ompi_communicator_t*, int, int, int, int const*, int, int const*, int const*, int, int, int) (TACSBlockCyclicMat.cpp:87)
==796863== by 0x609ABE04: TACSSchurPc::TACSSchurPc(TACSSchurMat*, int, double, int) (TACSSchurMat.cpp:1096)
==796863== by 0x6077EA3D: __pyx_pf_4tacs_4TACS_2Pc___cinit__ (TACS.cpp:34395)
==796863== by 0x6077EA3D: __pyx_pw_4tacs_4TACS_2Pc_1__cinit__ (TACS.cpp:34155)
==796863== by 0x6077EA3D: __pyx_tp_new_4tacs_4TACS_Pc(_typeobject*, _object*, _object*) (TACS.cpp:89770)
==796863== by 0x2589C6: _PyObject_MakeTpCall (in /usr/bin/python3.10)
==796863== by 0x25214F: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x2629FB: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==796863== by 0x24B45B: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x257C13: _PyObject_FastCallDictTstate (in /usr/bin/python3.10)
==796863== by 0x26CB04: ??? (in /usr/bin/python3.10)
==796863== by 0x258A1B: _PyObject_MakeTpCall (in /usr/bin/python3.10)
To summarize:
The initial residual is deterministic even when the solution is not
The non-determinism is independent of element type / constitutive model
Tightening the solver tolerance has no effect, nor does lowering compiler optimization
Happens reliably with large, parallel problems (nproc > 2) (but, also happened in serial at times)
Other notes:
We looked into the preconditioner and stiffness matrices and computed min / max / mean values, and these values were deterministic even when the results were not
When running a coupled aerostructural derivative check with DAFoam and MELD through MPhys, we noticed that identical runs of the same script / case led to different analytic derivatives (ignore the FD check, that is due to generally poor, but consistent convergence from DAFoam):
We were not sure where this came from and tried to check the aero solver, load and displacement transfer, and structural solver. In the primal solve, we found slight differences between the results only after the first pass of the structural solution. We started printing out the norm of the flattened values passing through MELD. The first pass through the load transfer is consistent between runs:
But, the pass back through the displacement transfer receives a different vector of displacements, leading to a slightly different converged aerostructural solution:
We then started looking at simpler cases and found that TACS-only primal solves are deterministic up to 8 digits, especially when running in parallel. @A-CGray and I reproduced this on his computer as well, and found that typically 3+ procs were problematic, but serial cases run with
python <scripy>.py
were also nondeterministic on one day, but fine the next.I tried out several of the examples to see which were deterministic, and @A-CGray scripted this on his computer as well using the following:
The test was to run the solver repeatedly with the same starting point and see if it was consistent. This led to the following results:
The initial residual is consistent, but the norm of the solution is not. We tried many of the examples and found that:
Deterministic:
Non-deterministic when run on >3-4 cores:
Out of curiosity, I tried the CRM example with Valgrind and found that with 16 procs there were two invalid reads. Fewer proc counts did not have this issue, but Valgrind might also miss invalid reads depending on if the memory is allocated or not.
To summarize:
Other notes: