ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
266 stars 122 forks source link

adios2_reorganize_mpi with HDF5 #2309

Open jychoi-hpc opened 4 years ago

jychoi-hpc commented 4 years ago

Robert and I are trying to use adios2_reorganize_mpi to convert bp to HDF5. But, it hangs without any progress or any error.

I am using the following command line:


mpirun -n 2 adios2_reorganize_mpi xgc.3d.00002.bp test.h5 BPFile "" HDF5 "" 2
``
The intput file is available [here](https://www.dropbox.com/s/z92j2zaophdt78p/xgc.3d.00002.bp.tar.gz?dl=0).

It works for a bp-to-bp conversion.
pnorbert commented 4 years ago

The basic problem is that some variables are read and written by only rank 0 (the scalars iphi nnode iphi) and the HDF5 engine wants collective IO by default. The parameter H5CollectiveMPIO=no should solve this issue but it actually doesn't.

mpirun -n 2 adios2_reorganize_mpi xgc.3d.00002.bp test.h5 BPFile "" HDF5 "H5CollectiveMPIO=no" 2

@guj what is required from the reorganize to make this case work with the HDF5 engine? Should still each rank read and write the scalar values to the output?

guj commented 4 years ago

hi @pnorbert @jychoi-hpc

the collective in hdf5 is off by default.

what needs to be done is to set write size = 1 for all ranks when you see a scalar. b.c defining variable is collective in hdf5. (and hdf5 engine defines variables when a put is called on that variable.)

pnorbert commented 4 years ago

@guj Who should do that? adios2_reorganize or the HDF5 engine? How?

pnorbert commented 4 years ago

so H5CollectiveMPIO=no is not enough or should not even be used?

guj commented 4 years ago

@pnorbert
yes, no need to set H5CollectiveMPI b/c default is NO please change at Reorganze.cpp: line 396, from "writesize=0" to "writesize=1" this is the quickest way to work around.