anilyil commented 2 years ago

Purpose

The projection routine used to embed points in volumes had 2 shortcomings:

When the linesearch is triggered, it would predict an unreasonably small step size. Now I just backtrack.
The convergence check was relying on the magnitude of the parametric derivative, which can be small based on control point locations. This was causing issues with the pygeo embedding where pyspline would think the projection is fine but pygeo would reject the point since it was slightly above the tolerance required.

The changes has the potential to slightly change all cases that use FFDs. How this would show up is that due to the line search change, the code might take a slightly different convergence path, or due to the tolerance change, the code might do an extra iteration. The changes should be on the order 1e-10 or less, but it is still important to note. Within a case, it does not affect derivative accuracy, but comparing against old cases, it will affect the recorded functions (e.g. our regression tests).

Expected time until merged: About a week or two.

Type of change

[x] Bugfix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[x] Breaking change (non-backwards-compatible fix or feature)
[ ] Code style update (formatting, renaming)
[ ] Refactoring (no functional changes, no API changes)
[ ] Documentation update
[ ] Maintenance update
[ ] Other (please describe)

Testing

Checklist

[ ] I have run flake8 and black to make sure the Python code adheres to PEP-8 and is consistently formatted
[ ] I have formatted the Fortran code with fprettify or C/C++ code with clang-format as applicable
[ ] I have run unit and regression tests which pass locally with my changes
[ ] I have added new tests that prove my fix is effective or that my feature works
[ ] I have added necessary documentation

anilyil commented 2 years ago

Do any of you have a simple test we can include for the embedding test @marcomangano @sseraj ? Just a small FFD will be fine, we can manually add a handful of coordinates that would have been problematic with the original implementation

marcomangano commented 2 years ago

@anilyil I don't have a small example but I am pretty sure I could try and reproduce the previous issue I fixed at the pyGeo level in this PR. There are a few changes in the code that it is probably quicker to discuss offline, do you and @sseraj have time later today or early next week? It looks we'll also need to retrain the tests for both pySpline and pyGeo. This could also be a good opportunity to bring up again regression tests on a full optimization problem, including checking the final result and not just the initialization as we are doing now.

anilyil commented 2 years ago

I am fine with adding a catch that changes it to backtracking if quadratic approx fails (like step size less than 0.1)? I am surprised changing the exit condition is not what broke the tests. I was expecting the linesearch to be inactive for most of the cases we are doing. Regardless of the tests getting broken or not, real world applications will change if we change the tolerance check.

bernardopacini commented 2 years ago

I do not know much about this issue and both @marcomangano and @sseraj have experience with it, so I am going to remove my review. Let me know if you end up wanting a third review later on -- I can add myself back.

codecov[bot] commented 2 years ago

Codecov Report

Merging #47 (16c4eb1) into main (7f0ff6d) will decrease coverage by 7.98%. The diff coverage is 66.66%.

@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
- Coverage   60.51%   52.53%   -7.99%     
==========================================
  Files           5        5              
  Lines        1616     1616              
==========================================
- Hits          978      849     -129     
- Misses        638      767     +129

Impacted Files	Coverage Δ
pyspline/pyVolume.py	`40.80% <ø> (-27.28%)`	:arrow_down:
pyspline/utils.py	`48.36% <66.66%> (ø)`

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

anilyil commented 2 years ago

This turned out to be a much bigger headache than I thought... The original solver had a bug that is baked into all of our tests. See this termination criteria here: https://github.com/mdolab/pyspline/blob/7f0ff6d993eb7b5c0299cc169e70f6d360aaa68a/src/projections.F90#L565-L569

The code checks if the current iteration has differed by eps from the previous iteration. If the detected change is less than eps, the code keeps the previous point and exits. This was originally done to detect points that are outside the volume and avoid running all the way until max iterations are reached.

However, there are several cases where a situation like this happens: Say the eps value is set to 1e-10. The current point is around 1.1e-10. The next iteration the solver takes puts the point right inside eps; however, because the previous point was already very close, this update is very small as well, more importantly, below eps. So the code thinks this point got stuck and is not improving, and keeps the original point at 1.1e-10 convergence. The correct solution is to take the new point since it satisfies the tolerance, whereas the old one does not. The new algorithm avoids this check. I have actually revised the exit criteria quite a bit; I am also aiming to detect points that are outside the volume and terminate if we are not making progress towards the point by checking the projected gradient.

The downside of this fix is that we now have 2 tests that fail in ADflow. The mismatch is tiny; the test test_adjoint.py:TestAdjoint_3_rans_tut_wing.test_adjoint results in:

Mismatched elements: 1 / 72 (1.39%)
Max absolute difference: 2.48547281e-08
Max relative difference: 1.37676067e-10

and the test test_adjoint.py:TestAdjoint_1_euler_scalar_JST_tut_wing.test_adjoint results in:

Mismatched elements: 2 / 72 (2.78%)
Max absolute difference: 1.72756245e-08
Max relative difference: 3.53686769e-10

These mismatches might be due to me running this on an ARM machine, but the tests do pass when I have pyspline/main compiled. The solution to this will be re-training these two adflow tests.

Another big change that was needed here was the volume test in pyspline. Previously, it was using a least squares fit based volume using data points. The data points when plotted in 3D looks like this: actual_data

However, the fit volume results in this hot mess: failed_fit

Our projection and evaluation tests were trained with this, which was essentially just making sure the code fails exactly the same way and any change would cause them to fail.

To address this issue, I have modified the test to use the "data" array as control points while creating the b-spline volume. I still loop over a range of b-spline orders in 3 directions, but the volume itself is created directly by passing control points. Then the test evaluates a range of points inside the volume and runs the projection tests with these. I also test points that are outside the domain to check if the solver can still find the nearest point in the domain.

Now that I modified the test, some methods related to getting information from the underlying data used to fit a volume is not tested. I have created an issue #49 about the volume interpolation and ideally, we need to add a new test that actually does interpolate correctly to address this. I don't have time to fix this now. I do not know if anyone is using this feature.

Finally, I am adding a challenging FFD test to pyGeo in https://github.com/mdolab/pygeo/pull/157. The FFD is based on @sseraj's cases. The points added to the FFD does not embed properly without the solver fixes in this PR, and they do embed correctly with the fixes here.

sseraj commented 2 years ago

To dos:

[x] Look at val_diff check
[x] Retrain select ADflow tests

anilyil commented 2 years ago

This is ready for the second round. The bounds check works as expected for the tests I have tried so far.

mdolab / pyspline

Point volume projection fixes #47

Purpose

Type of change

Testing

Checklist

Codecov Report