MPhys DVCon derivatives are incorrect with multiple processors

joanibal commented 1 year ago

Description

Related issues have be a reoccurring topic in MPhys. I think for DVCon the issue is still unresolved, but I not in the loop on this type of issue.

Steps to reproduce issue

Run the following script

import numpy as np
import openmdao.api as om
import pyspline
from mphys.multipoint import Multipoint

from pygeo.mphys import OM_DVGEOCOMP
from pygeo.constraints.thicknessConstraint import ThicknessConstraint
from collections import OrderedDict

class Top(Multipoint):
    def setup(self):
        self.add_subsystem("dvs", om.IndepVarComp(), promotes=["*"])
        self.dvs.add_output("scale_section", val=np.ones(2) * 1.0)

        self.add_subsystem("geometry", OM_DVGEOCOMP(file="cube_ffd.xyz", type="ffd"))

    def configure(self):
        # add ref axis
        center_line_cords = np.array(
            [
                [-1.0, 0.0, 0.0],
                [1.0, 0.0, 0.0],
            ]
        )
        c0 = pyspline.Curve(X=center_line_cords, k=2)
        self.geometry.nom_addRefAxis(
             name="centerline", curve=c0, axis=np.array([1.0, 0.0, 0.0]),
        )

        # add design variables
        def scale_sections(val, geo):
            for i in range(geo.scale["centerline"].coef.size):
                geo.scale["centerline"].coef[i] = val[i]

        self.geometry.nom_addGlobalDV(
            "scale_section",
            np.ones(2) * 1.0,
            scale_sections,
        )

        # add contraint
        pts = np.array(
            [
                [0.0, -0.5, 0.0],
                [0.0, 0.5, 0.0],
            ]
        )

        typeName = "thickCon"
        conName = "thickness"
        lower = 0.0
        upper = 2.0
        scaled = True
        scale = 1.0

        self.geometry.DVCon.constraints[typeName] = OrderedDict()
        self.geometry.DVCon.constraints[typeName][conName] = ThicknessConstraint(
            conName, pts, lower, upper, scaled, scale, self.geometry.DVCon.DVGeometries["default"], True, None
        )
        self.geometry.add_output(conName, val=np.ones(pts.shape[0] // 2))

        # connect dvs to geometry
        self.connect("scale_section", "geometry.scale_section")
        # add design variables and contraints to the problem
        self.add_design_var("scale_section",)
        self.add_constraint("geometry.thickness", lower=lower, upper=upper, scaler=1.0)

prob = om.Problem()
prob.model = Top()
prob.setup(mode="rev")
om.n2(prob, show_browser=False, outfile="mphys_con_error.html")
prob.run_model()
prob.check_totals(step=3e-3)

on 1 processor I get

-----------------
Total Derivatives
-----------------

  Full Model: 'geometry.thickness' wrt 'dvs.scale_section'
    Analytic Magnitude: 7.071068e-01
          Fd Magnitude: 7.071068e-01 (fd:None)
    Absolute Error (Jan - Jfd) : 8.038873e-14

    Relative Error (Jan - Jfd) / Jfd : 1.136868e-13

    Raw Analytic Derivative (Jfor)
[[0.5 0.5]]

    Raw FD Derivative (Jfd)
[[0.5 0.5]]
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

on 2 I get

-----------------
Total Derivatives
-----------------

  Full Model: 'geometry.thickness' wrt 'dvs.scale_section'
    Analytic Magnitude: 3.535534e-01
          Fd Magnitude: 0.000000e+00 (fd:None)
    Absolute Error (Jan - Jfd) : 3.535534e-01 *

    Relative Error (Jan - Jfd) / Jan : 1.000000e+00 *

    MPI Rank 1

    Raw Analytic Derivative (Jfor)
[[0.25 0.25]]

    Raw FD Derivative (Jfd)
[[0. 0.]]
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-----------------
Total Derivatives
-----------------

  Full Model: 'geometry.thickness' wrt 'dvs.scale_section'
    Analytic Magnitude: 3.535534e-01
          Fd Magnitude: 7.071068e-01 (fd:None)
    Absolute Error (Jan - Jfd) : 3.535534e-01 *

    Relative Error (Jan - Jfd) / Jfd : 5.000000e-01 *

    MPI Rank 0

    Raw Analytic Derivative (Jfor)
[[0.25 0.25]]

    Raw FD Derivative (Jfd)
[[0.5 0.5]]
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Expected behavior

I would expect the derivatives to be correct on all procs or at least the root proc

Code versions

I'm running this in the stable public docker container

Operating System:
Python:
OpenMPI:
CGNS:
PETSc:
Compiler:
This repository: latest commit (b7f1bfbd01a5638dbd827013906e63498074e65b)

joanibal commented 1 year ago

The issue is related to this part of the MPhys wrapper. https://github.com/mdolab/pygeo/blob/41ffb0ba482fa1d2ccd6336d9fde62502ad8de1d/pygeo/mphys/mphys_dvgeo.py#L311-L315

If you change it so each proc runs the following

dout = d_outputs[constraintname]
jvtmp = np.dot(np.transpose(dcdx), dout)

then the derivatives are correct on the root proc but zero on all others. This may be the "correct" behavior but I'm not sure. It at least allows my optimization to run correctly.

anilyil commented 1 year ago

The reason DVCon functions are implemented that way is because when we were first developing this, OpenMDAO only allowed entire components to be distributed, meaning all of their outputs would be distributed. Since then, this has changed and we can have both distributed and non-distributed outputs coming out of the same component.

The way we do this in MACH is to have the DVCon stuff duplicated on all processors. It is not the worst in terms of performance, and one could argue it is easier for people to understand. Therefore, I am not inclined to change the default MACH behavior.

Final piece of information needed here is that with the recent changes, OpenMDAO's parallelism expects the same reverse seeds if we are going through a serial variable. In this case, the DVGeo DVs are serial (non-distributed) whereas the DVCon outputs are distributed. As @bernardopacini figured out, this only worked out of luck, but does not fully conform to the OpenMDAO's convention of having identical seeds on all procs for a non-distributed variable.

With all of this in mind, @bernardopacini's PR #191 fixes this issue by declaring DVCon outputs as non-distributed. This way, the MPhys DVCon wrapper behaves the same way as MACH. This should make the code easier to understand and fix the derivative inconsistency across processors.

Finally, in OpenMDAO, doing it this way is actually faster than using distributed variables. The proper way to implement it with distributed variables would require the use of an allreduce on the comm. We can avoid this collective communicaton and just duplicate the cheap DVCon computations on all procs.

So, we believe #191 fixes your issue @joanibal. I will tag you in that PR as a reviewer. Can you please test the new implementation with your case?

mdolab / pygeo