Discepancies between serial and parallel build of CUPyDO

acrovato commented 5 years ago

Context

I build CUPyDO without MPI on my computer to quickly run my tests before commiting. I also build CUPyDO in parallel on a test machine (gaston) to 1) ensure compatibility with parallel build of SU2, 2) test this feature of CUPyDO.

Issue

I noticed somewhat large discrepancies in the results of a simple test case (airfoil in a steady flow attached to vertical rotational springs) between a serial and a parallel build of CUPyDO. I ensured that it was not linked to the machine by building CUPyDO in serial on the test machine. Here is the output produced by the test machine (serial runs). The test case is Flow_RBM/staticAirfoil_fsi.py. It is solved with the IQN algorithm. The structural and fluid meshes are matching. Serial build:

FSI residual: 5.81484980254e-07
FSI iterations: 12
[CTest] Lift coefficient = 0.773380 (expected 0.774016 +/- 5.0%)
    rel diff = 8.211480e-04 <= 5.000000e-02 [ok]
[CTest] Vertical displacement = 0.154501 (expected 0.154490 +/- 0.010000)
    abs diff = 1.130814e-05 <= 1.000000e-02 [ok]
[CTest] Rotational displacement = 3.441070 (expected 3.521739 +/- 0.500000)
    abs diff = 8.066920e-02 <= 5.000000e-01 [ok]

Parallel build:

FSI residual: 9.07097320559e-07
FSI iterations: 16
[CTest] Lift coefficient = 0.708691 (expected 0.774016 +/- 5.0%)
    rel diff = 8.439698e-02 > 5.000000e-02 [wrong!]
[CTest] Vertical displacement = 0.142204 (expected 0.154490 +/- 0.010000)
    abs diff = 1.228569e-02 > 1.000000e-02 [wrong!]
[CTest] Rotational displacement = 2.899969 (expected 3.521739 +/- 0.500000)
    abs diff = 6.217705e-01 > 5.000000e-01 [wrong!

Questions

Is there a difference in the data treatment when building with MPI? More specifically, are the data managed by petsc even if CUPyDO is built with, but run without MPI? If so, is it not better to always use petsc to avoid such discrepancies?

mlucio89 commented 5 years ago

IQN in parallel has to be used with care. According to what David told me before leaving there was some issue in using IQN in MPI (I could not test it with PFEM, which is not MPI yet), but now I cannot remember if it was just a performance issue or a true problem... HAve you tried running the same case with BGS in serial and parallel? Do you observe the same problem?

tobadavid commented 5 years ago

When using CUPyDO in parallel, I would not trust IQN before extensive tests are performed and compared to BGS (unfortunately I did not have time to do it)...

@acrovato , the answer to your question: YES there is a true difference in the data treatment between a serial and parallel build (not execution !) of CUPyDO. In the serial build, PETSc is not used at all. The linear algebra engine is simply limited to Numpy/Scipy. In the parallel build, PETSc vectors and matrices are automatically used, even if it is not run with MPI (it is like running a parallel code on one single thread...). I guess this may impact the final results somehow. The reason why we have this dual build is that, at the beginning, we did not want to force users to install PETSc (which is not the most multi-platform lib we can find) to run CUPyDO... Removing this duality was also in line with "my" desire to change the linear algebra engine of CUPyDO.

acrovato commented 5 years ago

Thanks for your inputs guys.

I tested with BGS:

on my laptop, with omega_Max = 1., the process converges within 24 iterations (res < 1e-6)
on gaston, with omega_Max = 1., the process diverges
on gaston, with omega_Max = .5, the process converges within 24 iterations (res < 1e-6)

When the algorithm converged, it gave the same results as with IQN.

In short, I do really think that the issue is linked to the different treatment of the data by numpy/scipy or petsc.

We will have more insights when I am able to test pmem_metafor on gaston.

acrovato commented 5 years ago

Status update I do not remember which changes affected this, but the discrepancies between serial and parallel build have reduced: Serial build:

FSI iteration: 12
FSI residual: 6.51785474705e-07
[CTest] Lift coefficient = 0.773380 (expected 0.774016 +/- 20.0%)
    rel diff = 8.211480e-04 <= 2.000000e-01 [ok]
[CTest] Vertical displacement = 0.154495 (expected 0.154490 +/- 0.020000)
    abs diff = 5.308141e-06 <= 2.000000e-02 [ok]
[CTest] Rotational displacement = 3.441070 (expected 3.521739 +/- 1.000000)
    abs diff = 8.066920e-02 <= 1.000000e+00 [ok]

Parallel build:

FSI iteration: 22
FSI residual: 5.80183459539e-07
[CTest] Lift coefficient = 0.805806 (expected 0.774016 +/- 20.0%)
    rel diff = 4.107207e-02 <= 2.000000e-01 [ok]
[CTest] Vertical displacement = 0.160653 (expected 0.154490 +/- 0.020000)
    abs diff = 6.163308e-03 <= 2.000000e-02 [ok]
[CTest] Rotational displacement = 3.712537 (expected 3.521739 +/- 1.000000)
    abs diff = 1.907982e-01 <= 1.000000e+00 [ok]

However, they are still there and convergence behavior in the parallel build has degraded. I still don't know if that's due to a difference between petsc and numpy or in the way RBM passes its results with or without MPI. I am also puzzled that this happens only on this test case. I have the same kind of test between Flow and Metafor and the serial and parallel build give the same results...

msanchezmartinez commented 3 years ago

I was checking PR #21 on a serial build and had some issues as well! The cantilever beam test case works fine on the parallel build but on the serial build it doesn't even start because some matrix is singular. Fortunately, it's a very small case so I can easily check the differences! I will keep this thread updated.

msanchezmartinez commented 3 years ago

I was checking PR #21 on a serial build and had some issues as well! The cantilever beam test case works fine on the parallel build but on the serial build it doesn't even start because some matrix is singular. Fortunately, it's a very small case so I can easily check the differences! I will keep this thread updated.

This is the SU2+pyBeam test case, by the way. After more checking, we use the sparse linear solver in SciPy on the serial case and the PETSc Krylov space solver on the parallel case. Using a flexible GCROT(m,k) algorithm seems to work in this particular (serial) case in which the A matrix has rows that are linearly dependent. There are differences between both builds but on the order of ~1%. I don't know what is the intended behaviour...

However, the initial test case (Flow+RBM staticNaca_fsi) shouldn't be affected by this because it uses a matching meshes algorithm.

ulgltas / CUPyDO