sandialabs / LCM

Laboratory for Computational Mechanics
Other
12 stars 7 forks source link

Most of LCM nightlies began failing on 2/7 #85

Closed ikalash closed 7 months ago

ikalash commented 7 months ago

Please see: https://sems-cdash-son.sandia.gov/cdash/index.php?project=Albany_LCM&date=2024-02-07

The errors look like:

p=0: *** Caught standard std::exception of type 'std::runtime_error' :

 Expr '!(not ok_traits || not ok_number_states || not ok_dimension)' eval'd to true, throwing.
 Error occurred at: stk_mesh/stk_mesh/baseImpl/FieldRepository.cpp:121
  0# stk::output_stacktrace(std::ostream&) in /home/lcm/LCM/trilinos-install-serial-clang-release/lib64/libstk_util_util.so.15
  1# stk::mesh::impl::FieldRepository::verify_field_type(stk::mesh::FieldBase const&, stk::mesh::DataTraits const&, unsigned int, shards::ArrayDimTag const* const*, unsigned int) const in /home/lcm/LCM/trilinos-install-serial-clang-release/lib64/libstk_mesh_base.so.15
  2# stk::mesh::impl::FieldRepository::get_field(stk::topology::rank_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, stk::mesh::DataTraits const&, unsigned int, shards::ArrayDimTag const* const*, unsigned int) const in /home/lcm/LCM/trilinos-install-serial-clang-release/lib64/libstk_mesh_base.so.15
  3# stk::mesh::Field<double, stk::mesh::Cartesian3d, void, void, void, void, void, void>& stk::mesh::MetaData::legacy_declare_field<stk::mesh::Field<double, stk::mesh::Cartesian3d, void, void, void, void, void, void>, 0>(stk::topology::rank_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, char const*, int) in /home/lcm/LCM/lcm-build-serial-clang-release/src/disc/stk/libalbanySTK.so
  4# Albany::MultiSTKFieldContainer<true>::MultiSTKFieldContainer(Teuchos::RCP<Teuchos::ParameterList> const&, Teuchos::RCP<stk::mesh::MetaData> const&, Teuchos::RCP<stk::mesh::BulkData> const&, int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int, Teuchos::RCP<Albany::StateInfoStruct> const&, Teuchos::Array<Teuchos::Array<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, Teuchos::Array<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&) in /home/lcm/LCM/lcm-build-serial-clang-release/src/disc/stk/libalbanySTK.so
  5# Albany::GenericSTKMeshStruct::SetupFieldData(Teuchos::RCP<Teuchos::Comm<int> const> const&, int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, Teuchos::RCP<Albany::StateInfoStruct> const&, int) in /home/lcm/LCM/lcm-build-serial-clang-release/src/disc/stk/libalbanySTK.so
  6# Albany::IossSTKMeshStruct::setFieldAndBulkData(Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Teuchos::ParameterList> const&, unsigned int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, Teuchos::RCP<Albany::StateInfoStruct> const&, unsigned int, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Teuchos::RCP<Albany::StateInfoStruct>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Teuchos::RCP<Albany::StateInfoStruct> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&) in /home/lcm/LCM/lcm-build-serial-clang-release/src/disc/stk/libalbanySTK.so
  7# Albany::DiscretizationFactory::createDiscretization(unsigned int, std::map<int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<int>, std::allocator<std::pair<int const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, Teuchos::RCP<Albany::StateInfoStruct> const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Teuchos::RCP<Albany::StateInfoStruct>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Teuchos::RCP<Albany::StateInfoStruct> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, Teuchos::RCP<Albany::RigidBodyModes> const&) in /home/lcm/LCM/lcm-build-serial-clang-release/src/libalbanyLib.so
  8# Albany::Application::createDiscretization() in /home/lcm/LCM/lcm-build-serial-clang-release/src/libalbanyLib.so
  9# Albany::Application::Application(Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Teuchos::ParameterList> const&, Teuchos::RCP<Thyra::VectorBase<double> const> const&, bool) in /home/lcm/LCM/lcm-build-serial-clang-release/src/libalbanyLib.so
 10# Albany::SolverFactory::createAlbanyAppAndModel(Teuchos::RCP<Albany::Application>&, Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Thyra::VectorBase<double> const> const&, bool) in /home/lcm/LCM/lcm-build-serial-clang-release/src/libalbanyLib.so
 11# Albany::SolverFactory::createAndGetAlbanyApp(Teuchos::RCP<Albany::Application>&, Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Thyra::VectorBase<double> const> const&, bool) in /home/lcm/LCM/lcm-build-serial-clang-release/src/libalbanyLib.so
 12# 0x00000000004149FB in /home/lcm/LCM/lcm-build-serial-clang-release/src/Albany
 13# __libc_start_main in /usr/lib64/libc.so.6
 14# 0x00000000004143BE in /home/lcm/LCM/lcm-build-serial-clang-release/src/Albany

 Error:  verify_field_type FAILED: Existing field = FieldBase<double,Cartesian3d>[ name = "coordinates" , #states = 1 ] Expected field info = FieldBase<double,Cartesian3d>[ #states = 1 ]

@alanw0, is this due to STK changes?

alanw0 commented 7 months ago

@ikalash it does look like an issue with stk fields. I'll get @djglaze to take a look at it.

djglaze commented 7 months ago

@ikalash I'm taking a look at this now. I'll let you know what I find soon.

djglaze commented 7 months ago

@ikalash Do you have a recipe you could point me to for building LCM on the CEE LAN? I've got a recipe for Albany that recycles scripts from part of the nightly dashboard on cee-compute011.sandia.gov, but I can't seem to find anything similar for LCM. There are a bunch of configure/build scripts under a ./doc/CEE directory, but they seem horribly out of date.

ikalash commented 7 months ago

Thank you for responding and looking at this, @alanw0 and @djglaze !

@djglaze : sorry about the scripts being out of date. We are not testing LCM regularly on CEE anymore, and this is why. You can use the following scripts from Albany master:

https://github.com/sandialabs/Albany/blob/master/doc/dashboards/cee-compute011.sandia.gov/sems-gcc-modules.sh https://github.com/sandialabs/Albany/blob/master/doc/dashboards/cee-compute011.sandia.gov/do-cmake-trilinos-mpi-sems-gcc https://github.com/sandialabs/Albany/blob/master/doc/dashboards/cee-compute011.sandia.gov/do-cmake-albany

You will likely need to add the following to your Trilinos configure:

  -D CMAKE_CXX_FLAGS:STRING='-std=gnu++17 -fext-numeric-literals' \
 -D Trilinos_ENABLE_MiniTensor:BOOL=ON \

and the following to the Albany-LCM configure script:

 -D CMAKE_CXX_FLAGS:STRING='-std=gnu++17 -fext-numeric-literals' \

You'll also need to edit the Albany-LCM configure script to point to your newly-created Trilinos install. Let me know if you have any issues/questions in trying to reproduce the problem.

djglaze commented 7 months ago

@ikalash I merged a fix for this issue into the Trilinos develop branch a little while ago (#12758). This was, indeed, a problem on the STK side related to our deprecation of the legacy Field handling. This fix changes our deprecation strategy for the extra Field template parameters that are being removed, so that it is no longer a compile-time deprecation warning to use them. There's now a run-time deprecation warning that's printed if these types are used. Sorry about breaking your code. This was a corner-case that was impossible to test with our Sierra-based builds, given how our builds are configured.

I noticed that the LCM code has not been converted to the new "simple fields" workflow like Albany has, in commit 9d593cd0 by @mperego. There will be a boatload of deprecation warnings now in LCM and we plan on removing this legacy code maybe 3-4 months from now, so there is a mild sense of urgency to get LCM converted as well. Do you need my help to attempt to convert it, or do you want to tackle it yourselves?

While Mauro did an awesome job with the conversion of Albany, there is still a small amount of legacy code in there. For instance, there are still a few references to stk::mesh::Cartesian, and a few calls to FieldBase::field_array_rank() which no longer has any meaning due to it referring to the extra Field template parameters that no longer exist. There's a chance that some of this machinery is still present so that you can interact with the auto-registered "coordinates" Field from IO. You can get the "coordinates" field switched to the new style by calling StkMeshIoBroker::use_simple_fields() or MetaData::use_simple_fields() very early in either one's construction. This will also convert any usage of deprecated functions into a run-time error, to prevent accidental regressions before they are removed completely.

If you want to query the "size" of a Field at run-time (which is something that you might have previously inferred from the now-removed template parameters), you can use the function stk::mesh::field_extent_per_entity() to get the specific local size while iterating through the mesh (since Fields can have a variable size across different Parts) or the FieldBase::max_extent() function, if you're sure that it has the same size across the whole mesh.

Hope this helps, Dave

ikalash commented 7 months ago

@djglaze : thank you, this is very helpful. I am much less plugged in to Trilinos than I used to be, and did not realize about these deprecations. It seems that the PR you are talking about this one by Mauro is https://github.com/sandialabs/Albany/commit/9d593cd0239ce83b1f3e753db482e0080014eae3, from early June, and we need to make the same changes in LCM. I can implement the changes, hopefully soon. I think they will be similar to what Mauro did for Albany master, maybe a bit simpler b/c we don't have all the STK discretization classes in LCM (e.g., there is no mesh extrusion, for example).

ikalash commented 7 months ago

@djglaze : it looks like your PR https://github.com/trilinos/Trilinos/pull/12758 allows to avoid the errors. Thanks! I will close this issue and open a separate one regarding reworking the code to not use the deprecated code.

mperego commented 7 months ago

@djglaze thanks for pointing out that there were still some calls to STK legacy code in Albany. I think that we got rid of them with PR https://github.com/sandialabs/Albany/pull/1010.