Closed joanibal closed 1 year ago
The issue is related to this part of the MPhys wrapper. https://github.com/mdolab/pygeo/blob/41ffb0ba482fa1d2ccd6336d9fde62502ad8de1d/pygeo/mphys/mphys_dvgeo.py#L311-L315
If you change it so each proc runs the following
dout = d_outputs[constraintname]
jvtmp = np.dot(np.transpose(dcdx), dout)
then the derivatives are correct on the root proc but zero on all others. This may be the "correct" behavior but I'm not sure. It at least allows my optimization to run correctly.
The reason DVCon functions are implemented that way is because when we were first developing this, OpenMDAO only allowed entire components to be distributed, meaning all of their outputs would be distributed. Since then, this has changed and we can have both distributed and non-distributed outputs coming out of the same component.
The way we do this in MACH is to have the DVCon stuff duplicated on all processors. It is not the worst in terms of performance, and one could argue it is easier for people to understand. Therefore, I am not inclined to change the default MACH behavior.
Final piece of information needed here is that with the recent changes, OpenMDAO's parallelism expects the same reverse seeds if we are going through a serial variable. In this case, the DVGeo DVs are serial (non-distributed) whereas the DVCon outputs are distributed. As @bernardopacini figured out, this only worked out of luck, but does not fully conform to the OpenMDAO's convention of having identical seeds on all procs for a non-distributed variable.
With all of this in mind, @bernardopacini's PR #191 fixes this issue by declaring DVCon outputs as non-distributed. This way, the MPhys DVCon wrapper behaves the same way as MACH. This should make the code easier to understand and fix the derivative inconsistency across processors.
Finally, in OpenMDAO, doing it this way is actually faster than using distributed variables. The proper way to implement it with distributed variables would require the use of an allreduce on the comm. We can avoid this collective communicaton and just duplicate the cheap DVCon computations on all procs.
So, we believe #191 fixes your issue @joanibal. I will tag you in that PR as a reviewer. Can you please test the new implementation with your case?
Description
Related issues have be a reoccurring topic in MPhys. I think for DVCon the issue is still unresolved, but I not in the loop on this type of issue.
Steps to reproduce issue
Run the following script
on 1 processor I get
on 2 I get
Expected behavior
I would expect the derivatives to be correct on all procs or at least the root proc
Code versions
I'm running this in the stable public docker container