grinsfem / grins

Multiphysics Finite Element package built on libMesh
http://grinsfem.github.io
Other
47 stars 39 forks source link

Prefer .xda.gz to .xdr in test cases #616

Closed roystgnr closed 2 years ago

roystgnr commented 2 years ago

The goal of this was to make it easier to debug the many test run failures I'm seeing ... but instead it looks like I've uncovered a Heisenbug; this change brings my failure count from "dozens" down to "3"

I don't know if you want to merge this (xda.gz really is about as efficient as xdr and is easier to debug problems with, so it probably is a better choice for test outputs), but I figured it's definitely astonishing enough to point out. I'm going to be very curious to find out exactly what regressed here, and how it affected so many GRINS tests without hitting failures in libMesh or MOOSE coverage.

pbauman commented 2 years ago

You’re seeing failures? I’ve been testing 100% pass on Apple clang and GCC 11.2.0 on Ubuntu 20 (modulo no chemistry libraries so those tests are skipped).

pbauman commented 2 years ago

Sorry, to be clear I mean master, I’ve not tried this branch.

pbauman commented 2 years ago

I just pulled libMesh master and rebuilt and dbg, devel, and opt mode all tests pass on my Ubuntu box. I'll try pulling this branch and see what devel does.

Currently Loaded Modules:
  1) gcc/11.2.0   2) openblas/0.3.20   3) mpich/4.0.1   4) petsc/3.16.6-opt   5) hdf5/1.8.21   6) boost/1.66.0   7) vtk/9.1.0   8) cppunit/1.15.1   9) libmesh/master
pbauman commented 2 years ago

OK, with this branch, I do see three failures in devel mode, same modules as above:

FAIL: exact_soln/heat_eqn_unsteady_2d_restart.sh
FAIL: amr/convection_diffusion_unsteady_2d.sh
FAIL: amr/convection_diffusion_unsteady_2d_petsc_diff.sh
pbauman commented 2 years ago

OK, the AMR ones are because I'm dumb and hardcoded the EquationSystem read call in the generic_amr_testing_app.C to xdr. Patch below fixes. I imagine the other test is similar, but need to verify.

diff --git a/test/amr/generic_amr_testing_app.C b/test/amr/generic_amr_testing_app.C
index f6bfe483..8771ce4b 100644
--- a/test/amr/generic_amr_testing_app.C
+++ b/test/amr/generic_amr_testing_app.C
@@ -188,7 +188,7 @@ int main(int argc, char* argv[])

       // This needs to match the read counter-part in GRINS::Simulation
       //FIXME: Need to support different input formats for restarts
-      std::string test_data = test_data_prefix+"."+step_string.str()+".xdr";
+      std::string test_data = test_data_prefix+"."+step_string.str()+".xda.gz";

       {
         std::ifstream i(test_data.c_str());
@@ -202,7 +202,7 @@ int main(int argc, char* argv[])

       libMesh::EquationSystems es(mesh);
       es.read(test_data,
-              GRINSEnums::DECODE,
+              GRINSEnums::READ,
               libMesh::EquationSystems::READ_HEADER |
               libMesh::EquationSystems::READ_DATA |
               libMesh::EquationSystems::READ_ADDITIONAL_DATA);
pbauman commented 2 years ago

Ah, OK, the other test fail was from the input file for the second run (restarting from the first run):

diff --git a/test/input_files/heat_eqn_unsteady_2d_restart_pt2.in b/test/input_files/heat_eqn_unsteady_2d_restart_pt2.in
index b4f6ff68..f5dd270b 100644
--- a/test/input_files/heat_eqn_unsteady_2d_restart_pt2.in
+++ b/test/input_files/heat_eqn_unsteady_2d_restart_pt2.in
@@ -69,7 +69,7 @@
 []

 [restart-options]
-   restart_file = './heat_eqn_unsteady_2d_restart_pt1.24.xdr'
+   restart_file = './heat_eqn_unsteady_2d_restart_pt1.24.xda.gz'
 []

 [linear-nonlinear-solver]

With these two patches, I'm passing 100% again, both on Apple clang and on GCC 11.2.0 on Ubuntu.

roystgnr commented 2 years ago

Feel free to push to this branch until it's 100% for you; I'll see if that fixes the last few failures for me too.

But now I'm really baffled as to why I'm seeing a couple dozen failures with xdr and you're seeing none. I'd long been assuming there had been some subtle regression, but now I'm wondering if it's that I switched from 32-bit to 64-bit dof_id_type for my default builds and that's somehow causing some incompatibility with our binary XdrIO.

pbauman commented 2 years ago

Pushed. Should probably squash those in. But I'm passing at 100% (in serial, no chemistry libraries) with libMesh master and these commits.

roystgnr commented 2 years ago

This gets me to 100% (likewise and likewise) too. Feel free to squash and/or merge and/or discard this PR as you please; I've made an xdr_debugging branch I can backport later work to if necessary.

pbauman commented 2 years ago

Squashed and merging. Thanks!