trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 565 forks source link

STK PR last week broke a number of Albany tests #3377

Closed ikalash closed 6 years ago

ikalash commented 6 years ago

The following PR broke a number of Albany tests last week: https://github.com/trilinos/Trilinos/commit/1d835e3595226bebef56c28862e4e8425af9431e . I have verified that reverting the commit yields a clean dashboard. The issue was originally discussed here in Albany issues: https://github.com/gahansen/Albany/issues/358 - transferring it to a Trilinos issue now that I have verified the problem is with Trilinos. To see verbose output from the failing tests, one can click on any of the tests here: http://cdash.sandia.gov/CDash-2-3-0/viewTest.php?onlyfailed&buildid=75093 (they all pass with the PR reverted). Most of them seem to be failures when running exodiff for LCM problems, but a few seem to have to do with reading in an Exodus file to restart from, e.g. http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=3880312&build=75103:

p=2: *** Caught standard std::exception of type 'std::runtime_error' :
 ERROR: Variable type counts are inconsistent. See processor 0 output for more details.
p=3: *** Caught standard std::exception of type 'std::runtime_error' :
 ERROR: Variable type counts are inconsistent. See processor 0 output for more details.
p=0: *** Caught standard std::exception of type 'std::runtime_error' :
 ERROR: Number of nodeset variables is not consistent on all processors.
        Database: th1d_tpetra.exo
    Processor 0 count = 3
    Processor 1 count = 0
    Processor 2 count = 0

Can someone please look into this? We'd like to get a clean Albany dashboard again as soon as possible. Tagging Albany developers who are likely interested in this: @bartgol, @lxmota, @ibaned, @jwfoulk, @mperego.

@trilinos/stk, @trilinos/seacas

ikalash commented 6 years ago

Sure. Closing now. Thanks again to @alanw0 for all his help tracking this down on the STK side!

alanw0 commented 6 years ago

Thanks @ikalash, and let us know if you need any more information about stk and stk-io.

ikalash commented 6 years ago

@alanw0 : I have @lxmota in my office now and we are talking about your proposed solutions. Regarding 3.: if we change the layout of the Cauchy Stresses to stk::mesh::Field<double, 8, FullTensor33>, will paraview be able to read these fields in as tensor fields? If so, this may be the best solution for us, as we would be able to easily perform various operations on the field, such as take the trace, etc., which is cumbersome with the current data layout.

alanw0 commented 6 years ago

@ikalash, I'm not sure how paraview would handle this, but I'll try to talk to some more knowledgeable paraview users and get back to you.

ikalash commented 6 years ago

@alanw0 : that would be great. Whether or not we choose to go with approach 3. depends very much on whether Paraview can handle in a nice way the relevant data layouts.

alanw0 commented 6 years ago

@ikalash Ryan Shaw provided me with a couple of links illustrating how paraview might represent tensor data using ellipsoids or other shapes. See these links: https://www.paraview.org/Wiki/ParaView/Users_Guide/List_of_filters#Tensor_Glyph https://www.youtube.com/watch?v=-L5iWxANCvk

It might take a little experimentation and consulting with paraview experts to find out exactly how it needs a tensor field to be laid out in the exodus file. That would influence how the stk field needs to be declared (w.r.t. its template parameters). So I'm not sure whether Field<double,gaussPoints,FullTensor33> is best, or whether Field<double,gaussPoints,3,3> would be better. Perhaps digging into paraview documentation would show what it needs.

ibaned commented 6 years ago

@alanw0 thanks for the link! the tensor glyph feature is something I've been looking for for a while. Based on my experience using VTU files, ParaView will recognize a symmetric tensor if it shows up as a tag with 6 components per node. I don't know what the behavior is when reading Exodus files.

alanw0 commented 6 years ago

@ikalash By the way, the actual template parameter would be Cartesian3D, and then the length value would be provided to put_field. So your previous declaration of Field<double,Cartesian3D,Cartesian3D,Cartesian3D> would be correct for the case of "gaussPoints,3,3", and you would provide 8,3,3 to the put_field call. If you do Field<double,Cartesian3D,FullTensor33> I think you would provide 8,9 to the put_field call. If stk-io doesn't translate these to exactly the right thing in the exodus file fields, let us know. You may be the first to try visualizing gauss-point tensor fields written by stk-io.

alanw0 commented 6 years ago

@ibaned stk fields can also be declared with a 'SymmetricTensor33' tag which stores 6 values. It might still require you to pass a 6 to put_field...

ikalash commented 6 years ago

@alanw0 , @ibaned : thanks for the info. @lxmota and I will have a look. I guess if someone has an exodus file that has a field with this data layout written to it, one could load it into paraview and easily check how it looks like there. Sounds like you've done this with VTU files @ibaned.

ikalash commented 6 years ago

@alanw0 : I am reopening this issue b/c I found another nightly test in Albany that is failing I believe due to STK changes (sorry! we have a lot of tests failing due to the STK change affecting exodiff, and this one got overlooked). It is an Aeras problem, which you did not build before. Here is the error:

http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=3946218&build=76064

When I do a backtrace using gdb, here it is:

(gdb) bt
#0  0x00007ffff4a90f75 in stk::mesh::EntityKey::id (this=0xfffffffffffffff0) at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build/install/include/stk_mesh/base/EntityKey.hpp:76
#1  Aeras::SpectralDiscretization::getMaximumID (this=this@entry=0xa38ff0, rank=rank@entry=stk::topology::EDGE_RANK)
    at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/disc/stk/Aeras_SpectralDiscretization.cpp:1334
#2  0x00007ffff4a93df2 in Aeras::SpectralDiscretization::enrichMeshQuads() () at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/disc/stk/Aeras_SpectralDiscretization.cpp:1425
#3  0x00007ffff4aaf065 in Aeras::SpectralDiscretization::updateMesh() () at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/disc/stk/Aeras_SpectralDiscretization.cpp:3606
#4  0x00007ffff4aafc1a in Aeras::SpectralDiscretization::SpectralDiscretization(Teuchos::RCP<Teuchos::ParameterList> const&, Teuchos::RCP<Albany::AbstractSTKMeshStruct>, int, int, Teuchos::RCP<Teuchos::Comm<int> const> const&, bool, Teuchos::RCP<Albany::RigidBodyModes> const&) () at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/disc/stk/Aeras_SpectralDiscretization.cpp:151
#5  0x00007ffff6d70cb9 in Albany::DiscretizationFactory::createDiscretizationFromInternalMeshStruct(std::map<int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<int>, std::allocator<std::pair<int const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, Teuchos::RCP<Albany::RigidBodyModes> const&) ()
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build/install/include/Teuchos_RCPNode.hpp:217
#6  0x00007ffff6d70f96 in Albany::DiscretizationFactory::createDiscretization(unsigned int, std::map<int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<int>, std::allocator<std::pair<int const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, Teuchos::RCP<Albany::StateInfoStruct> const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Teuchos::RCP<Albany::StateInfoStruct>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Teuchos::RCP<Albany::StateInfoStruct> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, Teuchos::RCP<Albany::RigidBodyModes> const&) () at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/disc/Albany_DiscretizationFactory.cpp:399
#7  0x00007ffff6a0a5b0 in Albany::Application::createDiscretization() () at /usr/include/c++/8/new:169
#8  0x00007ffff6a2db63 in Albany::Application::Application(Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Teuchos::ParameterList> const&, Teuchos::RCP<Tpetra::Classes::Vector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&, bool) () at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/Albany_Application.cpp:102
#9  0x00007ffff5f324c7 in Albany::SolverFactory::createAlbanyAppAndModelT(Teuchos::RCP<Albany::Application>&, Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Tpetra::Classes::Vector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&, bool) () at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/Albany_SolverFactory.cpp:1003
#10 0x00007ffff5f64d5e in Albany::SolverFactory::createAndGetAlbanyAppT(Teuchos::RCP<Albany::Application>&, Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Tpetra::Classes::Vector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&, bool) ()
    at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/Albany_SolverFactory.cpp:879
#11 0x0000000000449f5d in main () at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build/install/include/Teuchos_RCPNode.hpp:755
#12 0x00007fffd07ac11b in __libc_start_main () from /usr/lib64/libc.so.6
#13 0x000000000044dbba in _start () at /home/ikalash/Albany_Schwarz_seg_fault_debug/src/Main_SolveT.cpp:320

Here is the line where the fail is happening: https://github.com/gahansen/Albany/blob/master/src/disc/stk/Aeras_SpectralDiscretization.cpp#L1334 . It is a STK call. Would you be able to look into this? One unique thing about this problem is it is using a higher order spectral element (25 node quad). I hope this info helps.

To turn on the problem, please enable Aeras:

-D ENABLE_AERAS:BOOL=OFF \
alanw0 commented 6 years ago

@ikalash ok I'll build that and let you know what we find.

alanw0 commented 6 years ago

@ikalash Hi Irina, sorry for the delay in getting back to you on this. The file you pointed to makes the stk call bulkData.end_entities(rank). There is a silly stk bug (recently added) where it assumes you call begin_entities(rank) before you call end_entities(rank). The 'begin' call clears a cache-like data-structure. We will fix it resume allowing just end_entities(rank) to be called. It will take a few days to get that in and over to trilinos. In the meantime, here is a work-around that you can add, to make your test pass:

diff --git a/src/disc/stk/Aeras_SpectralDiscretization.cpp b/src/disc/stk/Aeras_SpectralDiscretization.cpp
index b678213..e68ca61 100644
--- a/src/disc/stk/Aeras_SpectralDiscretization.cpp
+++ b/src/disc/stk/Aeras_SpectralDiscretization.cpp
@@ -1330,6 +1330,7 @@ stk::mesh::EntityId
 Aeras::SpectralDiscretization::getMaximumID(const stk::mesh::EntityRank rank) const
 {
   // Get the local maximum ID
+  bulkData.begin_entities(rank);
   stk::mesh::EntityId last_entity =
     (--bulkData.end_entities(rank))->first.id();

In other words, just add a call to bulkData.begin_entities(rank) before your call to end_entities. I'll let you know when our fix is in, so you can then take this back out.

Edited by @ibaned to fix formatting

alanw0 commented 6 years ago

@ikalash P.S. It looks like the formatting messed up my diff a little bit. There should be a '+' in front of the line "bulkData.begin_entities(rank);".

ibaned commented 6 years ago

@alanw0 Thanks for finding this! I just fixed the formatting in your comment, please use the edit feature to see what I did (three backticks before/after).

alanw0 commented 6 years ago

@ibaned thanks!

ikalash commented 6 years ago

Thanks @alanw0 ! I just tried your temporary fix and it worked. Glad to hear this helped to fix a recently introduced bug! I'll close the issue now hopefully for good. I'm still working on going through all our failures and rebaselining so there is a chance I'll discover something else that got broken in the process; but hopefully not!