hpc / libhio

libhio is a library intended for writing data to hierarchical data store systems.
Other
20 stars 11 forks source link

libhio with internal json-c build fails on power9 nodes #49

Open hppritcha opened 5 years ago

hppritcha commented 5 years ago

The json tarball included in libhio is too old to recognize the system type - ppc64le, so the build fails when it tries to build the json lib.

The json tarball needs to be updated to one of the 0.13.1 releases. These releases work on darwin power9 nodes, for example.

The tarball needs to be patched to support the doc gen removal and function renaming. See the json-c.patch file. Note the current patch fails to patch cleanly on either json-c master or the 0.13.1 tags. This will have to be manually redone.

hppritcha commented 5 years ago

@floquet you don't see this problem because if you're building with spack, the libhio recipe uses an external json-c.

hjelmn commented 5 years ago

Easy enough to fix. Untar it. Run autoreconf -ivf and tar it. (the json-c tarball that is)

floquet commented 5 years ago

@hppritcha: Darwin Power9: I look for /usr/lib64/json-c. If not found, build latest.

Spack created these modules when it built json-c: /scratch/users/dantopa/new-spack/libraries/darwin-power9.libhio/share/spack/modules/linux-rhel7-ppc64le/json-c/0.13.1-gcc-4.8.5, json-c/0.13.1-gcc-6.4.0

plamborn commented 5 years ago

I was able to run autoreconf as @hjelmn suggests. However, when testing the resulting build I get ucx errors. It looks like this is common with power9s. I need to spend more time with it to solve the ucx error.

plamborn commented 5 years ago

To remove the ucx error, I built ucx and then built openmpi to specifically include ucx. I started with the instructions I found here: https://github.com/openucx/ucx/wiki/OpenMPI-and-OpenSHMEM-installation-with-UCX

However, my newly built openmpi encountered a error with MPI_Win_allocate_shared. I found that hio already has an alternative code path to work around a "HIO_CRAY_BUFFER_BUG_1" that does not use MPI_Win_allocate_shared. The alternative path is used on the Cray systems already to handle an issue encountered with using HIO with mpich.

After switching to the alternative code path and recompiling libhio, the test cases pass on the darwin power9. I am not sure if there is a performance reason for using MPI_Win_allocate_shared versus the alternative path.

hjelmn commented 5 years ago

Hmm, try running with --mca osc rdma --mca btl_uct_memory_domains mlx5_0

hjelmn commented 5 years ago

The reason for the alternate path is Cray's insistence in using XPMEM underneath MPI_Win_allocate_shared. XPMEM memory regions can't be used for mutexes, condition variables, etc.

plamborn commented 5 years ago

So I am not sure to which version of HIO your suggestion of adding " --mca osc rdma --mca btl_uct_memory_domains mlx5_0" was related to.

I tried it for a hio built against the gcc/7.3.0 and openmpi/2.1.5-gcc_7.3.0 modules on darwin. That version segfaults with and without these --mca options.

For my hio using a openmpi I personally built with ucx I tried with and without the MPI_Win_allocate_shared call.

With the MPI_Win_allocate_shared, I get the same error I saw previously indicating a failure during the allocate_shared call.

Without MPI_Win_allocate_shared, I get the following warning: "Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue." The test seems to run correctly.

If I add the suggested "--mca opal_common_ucx_opal_mem_hooks 1" option, I get this warning "Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption." The test seems to run correctly in this case as well.

I have played with --mca options to mpirun on my own while working on this issue, never successfully. Can you explain why you suggested these particular options and what you hoped they would accomplish?

It is interesting that you added the allocate_shared work around because XPMEM didn't work for mutexes, but I believe I am seeing an error during the call to MPI_Win_allocate_shared itself, not when it is being used for a mutex or the like.