libMesh / TIMPI

Templated Interface to MPI
GNU Lesser General Public License v2.1
12 stars 11 forks source link

Issues with set_union() #105

Closed jwpeterson closed 1 year ago

jwpeterson commented 1 year ago

I ran into some issues with set_union() for the following container types:

std::map<unsigned int, std::set<boundary_id_type>> mapset;
std::map<unsigned int, std::map<boundary_id_type, Real>> mapmap;

The code compiles fine but either does not do the set_union() correctly (mapset) or crashes (mapmap). I am pasting some standalone example codes in the comments below, GitHub won't let me attach them as files for some reason.

jwpeterson commented 1 year ago

mapset example

// This test is to debug set_union() errors I observed for certain
// container types. The test should be run on four procs using the
// command:
// mpiexec -n 4 ./timpi_test_1-opt --keep-cout --redirect-output

// libmesh includes
#include "libmesh/libmesh.h"
#include "timpi/communicator.h"
#include "timpi/parallel_implementation.h"

// C++ includes
#include <map>
#include <set>

using namespace libMesh;

int main(int argc, char ** argv)
{
  LibMeshInit init (argc, argv);

  if (init.comm().size() != 4)
    libmesh_error_msg("Must run on exactly 4 procs");

  // Test 1
  std::map<unsigned int, std::set<boundary_id_type>> mapset;

  // Values on all procs
  mapset[0].insert(20201);

  // Insert extra values on procs 0 and 2
  auto rank = init.comm().rank();
  switch (rank)
    {
    case 0:
    case 2:
      mapset[0].insert(60201);
      break;
    default:
      break;
    }

  // Call set_union()
  init.comm().set_union(mapset);

  // Print results on all procs. Currently what I see is:
  // key = 0, surface_ids = 20201 60201
  // key = 0, surface_ids = 20201
  // key = 0, surface_ids = 20201 60201
  // key = 0, surface_ids = 20201
  // whereas what I would expect to see is that all procs have the
  // same surface_ids.
  for (const auto & [key, boundary_id_set] : mapset)
  {
    std::cout << "key = " << key << ", surface_ids = ";
    for (const auto & bid : boundary_id_set)
      std::cout << bid << " ";
    std::cout << std::endl;
  }

  return 0;
}
jwpeterson commented 1 year ago

mapmap example

// This test is to debug set_union() errors I observed for certain
// container types. The test should be run on four procs using the
// command:
// mpiexec -n 4 ./timpi_test_2-opt --keep-cout --redirect-output

// libmesh includes
#include "libmesh/libmesh.h"
#include "timpi/communicator.h"
#include "timpi/parallel_implementation.h"

// C++ includes
#include <map>

using namespace libMesh;

int main(int argc, char ** argv)
{
  LibMeshInit init (argc, argv);

  if (init.comm().size() != 4)
    libmesh_error_msg("Must run on exactly 4 procs");

  // Test 2
  std::map<unsigned int, std::map<boundary_id_type, Real>> mapmap;

  // Values on all procs
  mapmap[0].emplace(20201, 0.8);

  // Insert extra values on procs 0 and 2
  auto rank = init.comm().rank();
  switch (rank)
    {
    case 0:
    case 2:
      mapmap[0].emplace(60201, 1.);
      break;
    default:
      break;
    }

  // Call set_union() - currently this gives me a segfault
  init.comm().set_union(mapmap);

  // Print results on all procs.
  for (const auto & [key, map] : mapmap)
  {
    std::cout << "key = " << key << ", (boundary_id, Real) pairs = ";
    for (const auto & [key2, value] : map)
      std::cout << "(" << key2 << ", " << value << ") ";
    std::cout << std::endl;
  }

  return 0;
}
jwpeterson commented 1 year ago

Probably worth mentioning that I'm a bit behind on the TIMPI submdule hash (current on libMesh/TIMPI@9b7bf889257ed4fa56488b29d1d436b1b54671f8) so I'm also going to try my test codes on the latest TIMPI hash and see if there's any difference.

jwpeterson commented 1 year ago

I'm also going to try my test codes on the latest TIMPI hash and see if there's any difference.

Same behavior with latest TIMPI submodule version.

jwpeterson commented 1 year ago

One more set_union() issue that I ran into was a compile error related to missing Packing<T> specializations for maps containing bool. Not sure if this is a known issue, but my assumption was that all of the fixed-size builtin types had StandardType<T> specializations, so I was a bit surprised this didn't automatically work.

// This test is to debug set_union() errors I observed for certain
// container types. The test should be run on four procs using the
// command:
// mpiexec -n 4 ./timpi_test_3-opt --keep-cout --redirect-output

// libmesh includes
#include "libmesh/libmesh.h"
#include "timpi/communicator.h"
#include "timpi/parallel_implementation.h"

// C++ includes
#include <map>

using namespace libMesh;

int main(int argc, char ** argv)
{
  LibMeshInit init (argc, argv);

  if (init.comm().size() != 4)
    libmesh_error_msg("Must run on exactly 4 procs");

  // Test 3
  std::map<unsigned int, bool> mapbool;

  auto rank = init.comm().rank();
  mapbool[rank] = true;

  // Call set_union() - currently this gives me a compiler error
  // related to missing Packing<T> specializations:
  // error: 'packable_size' is not a member of 'libMesh::Parallel::Packing<bool, void>'
  init.comm().set_union(mapbool);

  // Print results on all procs.
  for (const auto & [key, val] : mapbool)
    std::cout << "(" << key << ", " << val << ") ";
  std::cout << std::endl;

  return 0;
}
roystgnr commented 1 year ago

bool is a tricky one; on top of the whole "vector<bool> is a specialization that doesn't really behave like vector" issue, when last I checked there was no safe standard way to tell MPI how to deal with a C++ bool - there was one in the C++ bindings at one point but it's now deprecated. It would be easy to get bool working in most cases, but the C++ standard doesn't even mandate sizeof(bool)==1 so the corner cases would scare me. We've got some special cases where we copy a bool to some kind of char before operating on it with MPI_CHAR and copying a result back, but without the ability to define StandardType<bool> we're really limited in what we can do with bool in nested data types. We could probably get it working with just Packing based communication pretty easily ... but then we'll have people who try to convert their fixed-size vector into array and watch their code suddenly fail to compile. Better to just use unsigned char as a workaround in containers, I think.

The code compiles fine but either does not do the set_union() correctly (mapset) or crashes (mapmap)

This, on the other hand, is just wrong. I'll see if I can replicate.

jwpeterson commented 1 year ago

Better to just use unsigned char as a workaround in containers, I think.

OK, that's what I did temporarily to work around the compiler error, so sounds like I'll just make that change permanent.

roystgnr commented 1 year ago

Hmm... this is more interesting than I expected on first glance.

If we were to instead ask for a union of singleton maps "(0,1)" and "(0,2)", what would you expect the result to be? It can't be "(0,1),(0,2)" if we don't have a multimap. It can't be "(0,1)" without dropping the "(0,2)" case or vice-versa.

Now, we can take a union of set({1}) and set({2}) in a way we can't of int(1) and int(2), so we could say we're off the hook in that case ... but I'm not sure that's actually the most consistent thing to do. It's not just inconsistent with unmergeable types, but with the way unions are handed elsewhere in C++. We're kind of trying to do what C++17 exposes via map::merge(), right? And in that case, the behavior is just "if you see duplicate keys, ignore the second value", not "do something special to combine values". Likewise with map::insert() "Inserts element(s) into the container, if the container doesn't already contain an element with an equivalent key", etc.

There's still a bug to be fixed here, for sure: we ought to be getting consistent results on every processor, not inconsistent results and certainly not a crash. But I think if we want to add a merge-with-sub-merges it might be clearest to give that its own API.

jwpeterson commented 1 year ago

OK, yeah from a generic standpoint I can see what you mean... in my specific use case it seemed "obvious" what should happen is that the value_type sets should be unioned... and set_union() is basically magic to me so I figured it might do the thing I wanted :grimacing:

roystgnr commented 1 year ago

Yeah, the documentation really needs to be clearer too.

roystgnr commented 1 year ago

I'm working on all this, BTW. Some of the underlying work actually overlaps with an issue someone hit last week.

roystgnr commented 1 year ago

Oh, man. While working on this I think I found a serious unrelated bug. When we pack a pair (or presumably tuple) of heterogeneous types, it looks like our prediction of the buffer size (which we use and rely on the accuracy of!) just checks sizeof() (and so includes struct padding), whereas when we pack we bypass any padding. We scream and die, but not at compile-time and not in opt mode...