sandialabs / Albany

Sandia National Laboratories' Albany multiphysics code
Other
282 stars 89 forks source link

Failing landIce_FO_GIS_AdjointSensitivity_Thickness test in nightly spack build #1054

Closed ikalash closed 4 months ago

ikalash commented 5 months ago

The landIce_FO_GIS_AdjointSensitivity_Thickness test has been failing in the nightly spack build. This started a few weeks ago, but I did not get a chance to look at it until today due to travel.

The attachment contains output from the test. Why is so much info about the MDFields dumped? That makes the output very difficult to parse through. Curiously, it looks like none of the regular nightlies are running this test, just the spack one. Is there a particular configure option that triggers the test to be turned on?

failed.txt

@mperego @bartgol

bartgol commented 5 months ago

I spent 5 min looking through the output, searching for a runtime error, when the issue is simply that the sensitivity norm doesn't match the expected value... We do need to trim output in our tests :)

As for the error, it appears one of the sens is WAY off, while the other is correct. The test only runs if ALBANY_MESH_DEPENDS_ON_PARAMETERS=TRUE, which is enabled via ENABLE_MESH_DEPENDS_ON_PARAMETERS. Looking at the dashboards folder, this is ON only for cee-compute011.sandia.gov. Is that the spack build? Maybe we could turn that option ON for a few more builds...

ikalash commented 4 months ago

Thanks for looking at it, @bartgol . Yes, it looks like ENABLE_MESH_DEPENDS_ON_PARAMETERS is only turned on in the Intel CEE build: https://github.com/sandialabs/Albany/blob/master/doc/dashboards/cee-compute011.sandia.gov/ctest_nightly_tmp.cmake#L489-L518 . It's odd to me that it's not on in the others. I can certainly turn it on in more nightlies - I will do this. Let me know if there are any specific ones you would like it enabled in. The Spack build does not push to the CDash site, so that is a separate thing. I bet if the option is enabled, we will see the same failure as in the spack build in the nightlies. We can see tomorrow.

Unfortunately the Intel build has been broken due to another issue, and right now the CDash site it down so I cannot check the history of this test in that build. I will try again later today.

bartgol commented 4 months ago

The only drawback of that option is to make the build/run time longer. So I would maybe enable it on machines that are not already slow, to make sure we get a report in a reasonable time...

ikalash commented 4 months ago

@bartgol : this is good to know. I will set it in the CEE gcc release build for the time being to see what happens.

Testing aside, how to you propose we go about fixing this test? I checked the deleted emails I have and it looks like the test got broken on 5/23 - that was the last day the spack build was clean. 5/24 had the failure. I can try to confirm this in the Intel build on the CDash site once it's back up.

bartgol commented 4 months ago

Looking at the git history, there are two things that happened on the 23rd, which could affect testing:

It's hard to believe any of those had an impact on the pass/fail of the test. We can look at the test log of the last passing test, and compare against the current log, to see if any of the parameters has somehow changed. But given the date you pinpointed, I doubt we'll see a difference. Maybe something changed in Trilinos? Like a default value in Piro or some other solver package? @mperego ?

ikalash commented 4 months ago

Ok I can try to do bisection to figure out what broke things on 5/23. Please stay tuned.

ikalash commented 4 months ago

It is the commits on 5/23 made by @mcarlson801 that are the culprit. I checked out the following shas:

It is not from Trilinos - I held constant the Trilinos and still reproduced the issue by checking out a different Albany version. @mcarlson801 : could you please look into this?

mcarlson801 commented 4 months ago

Yeah, I'll look into this. I assume I can replicate this elsewhere by enabling ENABLE_MESH_DEPENDS_ON_PARAMETERS?

ikalash commented 4 months ago

@mcarlson801 : correct. Thanks!

mcarlson801 commented 4 months ago

I just pushed a change that should fix this. I had moved some temporary DynRankViews for the grounding line flux response into the kokkos kernel but this seemed to introduce some inaccuracy to the local responses which threw off the sensitivity computation when using certain scalar types.

ikalash commented 4 months ago

Thanks, @mcarlson801 ! I'll close this once the nightlies are clean. Unfortunately Trilinos has broken the Albany nightlies and we may have to wait until that is fixed to check that things are clean.

mcarlson801 commented 4 months ago

Looks like my fix for this might have broken some other tests using grounding line flux response. I'm probably going to just revert the uvm-free changes to the GLFlux evaluator and try to figure out what's going on.