smdogroup / tacs

Finite-element library for analysis and adjoint-based gradient evaluation
Apache License 2.0
108 stars 75 forks source link

Non-determinism in primal solution and derivatives #305

Open bernardopacini opened 7 months ago

bernardopacini commented 7 months ago

When running a coupled aerostructural derivative check with DAFoam and MELD through MPhys, we noticed that identical runs of the same script / case led to different analytic derivatives (ignore the FD check, that is due to generally poor, but consistent convergence from DAFoam):

Run 1: 
Full Model: 'maneuver.struct_post.eval_funcs.ks_vmfailure' wrt 'dvs.sweep_wing'
     Reverse Magnitude: 1.176269e-03
          Fd Magnitude: 6.548034e-04 (fd:forward)

Run 2:
Full Model: 'maneuver.struct_post.eval_funcs.ks_vmfailure' wrt 'dvs.sweep_wing'
     Reverse Magnitude: 1.527612e-03
          Fd Magnitude: 6.548034e-04 (fd:forward)

We were not sure where this came from and tried to check the aero solver, load and displacement transfer, and structural solver. In the primal solve, we found slight differences between the results only after the first pass of the structural solution. We started printing out the norm of the flattened values passing through MELD. The first pass through the load transfer is consistent between runs:

Run 1: 
MELD Primal Load Transfer x_struct0 24702.433470988755
MELD Primal Load Transfer x_aero0 23658.198184194116
MELD Primal Load Transfer u_struct 0.0
MELD Primal Load Transfer f_aero 8100.573476032763
MELD Primal Load Transfer f_struct 1700.082296927264

Run 2:
MELD Primal Load Transfer x_struct0 24702.433470988755
MELD Primal Load Transfer x_aero0 23658.198184194116
MELD Primal Load Transfer u_struct 0.0
MELD Primal Load Transfer f_aero 8100.573476032763
MELD Primal Load Transfer f_struct 1700.082296927264

But, the pass back through the displacement transfer receives a different vector of displacements, leading to a slightly different converged aerostructural solution:

Run 1: 
MELD Primal Disp Transfer x_struct0 24702.433470988755
MELD Primal Disp Transfer x_aero0 23658.198184194116
MELD Primal Disp Transfer u_struct 47.00897156793725
MELD Primal Disp Transfer u_aero 33.24244380757927

NL: NLBGS 5 ; 219.13182 1.42895947e-06 

Run 2:
MELD Primal Disp Transfer x_struct0 24702.433470988755
MELD Primal Disp Transfer x_aero0 23658.198184194116
MELD Primal Disp Transfer u_struct 47.00897157165467
MELD Primal Disp Transfer u_aero 33.24244381025362

NL: NLBGS 5 ; 219.131819 1.42895946e-06

We then started looking at simpler cases and found that TACS-only primal solves are deterministic up to 8 digits, especially when running in parallel. @A-CGray and I reproduced this on his computer as well, and found that typically 3+ procs were problematic, but serial cases run with python <scripy>.py were also nondeterministic on one day, but fine the next.

I tried out several of the examples to see which were deterministic, and @A-CGray scripted this on his computer as well using the following:

randVec = problem.assembler.createVec()
randVec.setRand()

for _ in range(20):
    problem.setVariables(randVec)
    # Solve state
    problem.getResidual(problem.res)
    initResNorm = problem.res.norm()
    problem.solve()
    uNorm = problem.u.norm()
    if comm.rank==0:
        print(f"{problem.name:>10s}: {uNorm=:.16e}, {initResNorm=:.16e}")

The test was to run the solver repeatedly with the same starting point and see if it was consistent. This led to the following results:

Output from slot_3d example

[9] TACSBlockCyclicMat: (6423,6423) Initial density: 0.832 factor fill in: 0.119
load_set_000: uNorm=3.1102406253022369e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406123293758e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406123293758e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102405960886136e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406234748215e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406229065651e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406001658109e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406133390252e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406123291494e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406141400342e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406437929267e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406079585327e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406018139010e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406079068340e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406123293758e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406045341430e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406196805662e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406217764444e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406022091790e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406003855588e-03, initResNorm=4.5878651244647644e+10

The initial residual is consistent, but the norm of the solution is not. We tried many of the examples and found that:

Deterministic:

Non-deterministic when run on >3-4 cores:

Out of curiosity, I tried the CRM example with Valgrind and found that with 16 procs there were two invalid reads. Fewer proc counts did not have this issue, but Valgrind might also miss invalid reads depending on if the memory is allocated or not.

==796863== Invalid read of size 4
==796863== at 0x6099B8B9: TACSBlockCyclicMat::get_block_owner(int, int) const (TACSBlockCyclicMat.h:142)
==796863== by 0x60992954: TACSBlockCyclicMat::merge_nz_pattern(int, int*, int*, int) (TACSBlockCyclicMat.cpp:554)
==796863== by 0x60990A07: TACSBlockCyclicMat::TACSBlockCyclicMat(ompi_communicator_t*, int, int, int, int const*, int, int const*, int const*, int, int, int) (TACSBlockCyclicMat.cpp:177)
==796863== by 0x609ABE04: TACSSchurPc::TACSSchurPc(TACSSchurMat*, int, double, int) (TACSSchurMat.cpp:1096)
==796863== by 0x6077EA3D: __pyx_pf_4tacs_4TACS_2Pc___cinit__ (TACS.cpp:34395)
==796863== by 0x6077EA3D: __pyx_pw_4tacs_4TACS_2Pc_1__cinit__ (TACS.cpp:34155)
==796863== by 0x6077EA3D: __pyx_tp_new_4tacs_4TACS_Pc(_typeobject*, _object*, _object*) (TACS.cpp:89770)
==796863== by 0x2589C6: _PyObject_MakeTpCall (in /usr/bin/python3.10)
==796863== by 0x25214F: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x2629FB: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==796863== by 0x24B45B: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x257C13: _PyObject_FastCallDictTstate (in /usr/bin/python3.10)
==796863== by 0x26CB04: ??? (in /usr/bin/python3.10)
==796863== by 0x258A1B: _PyObject_MakeTpCall (in /usr/bin/python3.10)

==796863== Address 0xc66f0b6c is 4 bytes before a block of size 60 alloc’d
==796863== at 0x484A2F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==796863== by 0x6099385F: TACSBlockCyclicMat::init_proc_grid(int) (TACSBlockCyclicMat.cpp:777)
==796863== by 0x609902AA: TACSBlockCyclicMat::TACSBlockCyclicMat(ompi_communicator_t*, int, int, int, int const*, int, int const*, int const*, int, int, int) (TACSBlockCyclicMat.cpp:87)
==796863== by 0x609ABE04: TACSSchurPc::TACSSchurPc(TACSSchurMat*, int, double, int) (TACSSchurMat.cpp:1096)
==796863== by 0x6077EA3D: __pyx_pf_4tacs_4TACS_2Pc___cinit__ (TACS.cpp:34395)
==796863== by 0x6077EA3D: __pyx_pw_4tacs_4TACS_2Pc_1__cinit__ (TACS.cpp:34155)
==796863== by 0x6077EA3D: __pyx_tp_new_4tacs_4TACS_Pc(_typeobject*, _object*, _object*) (TACS.cpp:89770)
==796863== by 0x2589C6: _PyObject_MakeTpCall (in /usr/bin/python3.10)
==796863== by 0x25214F: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x2629FB: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==796863== by 0x24B45B: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x257C13: _PyObject_FastCallDictTstate (in /usr/bin/python3.10)
==796863== by 0x26CB04: ??? (in /usr/bin/python3.10)
==796863== by 0x258A1B: _PyObject_MakeTpCall (in /usr/bin/python3.10)

To summarize:

Other notes: