Open guj opened 3 months ago
I think I came across sth similar last week (but I actually got an error instead of a hang).
The issue was that I was also calling storeChunk
with data vector whose .data()
was a nullptr
(but I also passed an extent
of 0
to storeChunk
).
Hello @guj and @pgrete this behavior is known and can only be fully fixed once we transition to flushing those Iterations that are open rather than those that are modified. It was not the recent optimization that broke this, rather BP5 is just much stricter with collective operations so this behavior is more likely to occur now. Until this is fully solved, please use the workaround implemented in https://github.com/openPMD/openPMD-api/pull/1619:
series.writeIterations()[1].seriesFlush();
This is guaranteed to flush Iteration 1 on all ranks regardless if it is modified or not.
Also, your example is missing a call to series.close()
before MPI_Finalize()
.
Thanks Franz., the work around works.
Describe the bug The recent optimization breaks a MPI use case when in file based mode. A minimal code is included below. One can use 2 ranks to see effect. In short, at the second flush, rank 1 has nothing to contribute, so it didn't call BP5 while rank 0 did. In essence, BP5 write is collective. So rank 0 hangs because inactivity of rank 1. If we use variable based, it looks like a flush to ADIOS is forced (by openPMD-api? ) on all ranks and so it works.
To Reproduce c++ example:
include <openPMD/openPMD.hpp>
include
include
include
using std::cout; using namespace openPMD;
int main(int argc, char *argv[]) { int provided; MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
}
Software Environment
Additional context