lanl / spiner

Performance portable routines for generic, tabulated, multi-dimensional data
https://lanl.github.io/spiner
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

Implement serialization and deserialization #93

Closed Yurlungur closed 2 months ago

Yurlungur commented 2 months ago

PR Summary

Using tabulated data in, e.g., MPI Windows for shared memory requires the ability to serialize a DataBox object into pre-allocated shared memory and to build a new thread-local object around said shared memory, so that the object itself is thread-local but it internally points at a table that lives in shared memory. This PR implements this capability.

This will be needed for the equivalent capability in Singularity-EOS and also provides a prototype for how that model and API will look.

PR Checklist

jhp-lanl commented 2 months ago

Looking forward to this! I'll probably review tomorrow

Yurlungur commented 2 months ago

A few big picture thoughts on serialization:

  1. The implementation I've written here relies on the fact that everything except the single pointer is trivially copyable. Thats why memcpy(dst, this, sizeof(*this)) works. Nested static arrays are okay, for the same reason they can be captured in a KOKKOS_LAMBDA, nested pointers would not be.
  2. This can be generalized to nested pointers, but one would have to hierarchically serialize and walk a graph, rather than a single memcpy for the static memory and a single memcpy (or pointer assignment) for the dynamic memory.
  3. This implementation ignores endian-ness and padding. When serializing and de-serializing on a single architecture... i.e., within a single compiled executable, this is fine. If we wanted to dump an object to disk by serializing it, and then load it up on a different architecture with, e.g., different endianness, this would not work. HDF5 (or some other user-handled thing) is the file-I/O strategy here. Do not use the serialization routines for file I/O.
  4. The default serialization/de-serialization routines are host only. This is by design. You cannot share device memory across MPI ranks (unless you are doing MPS), so it doesn't make sense to try and share device memory in this way. In a device context there are two possible patterns: (a) The host-side databox is shared accross MPI ranks. When you call GetOnDevice that creates a thread-local device-side databox. I think this is the preferred pattern and it's what will happen if you just naively call the API. (b) You can create shared device-side databoxes by creating an array of databox objects on device and then calling setPointer to force them to share memory. I think this is best handled manually by user code, but spiner supports it with the setPointer method.
Yurlungur commented 2 months ago

Tests triggered on re-git.

Yurlungur commented 2 months ago

Tests triggered on re-git.

tests pass.