ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
268 stars 125 forks source link

Extend global single value variables to behave more like variable attributes #2792

Open franzpoeschel opened 3 years ago

franzpoeschel commented 3 years ago

Summary: Global single value variables are a useful feature for storing variable metadata. As a result, any feature supported by ADIOS2 attributes (constant metadata) is probably useful for global single values, too. Ideally, both concepts can be semantically unified to support a general notion of (changing vs. constant) metadata.

Proposed feature Global single values sit conceptually between "regular" ADIOS2 variables (global arrays) and ADIOS2 attributes. Unlike attributes, they are variable and can change across steps as a first-class feature. Unlike global array variables, they are stored as part of the metadata.

Quoting the documentation linked above:

These variables are helpful for storing global information, preferably managed by only one MPI process, that may or may not change over steps: e.g. total number of particles, collective norm, number of nodes/cells, etc.

=> Effectively, these are intended for variable (~changing) metadata, while ADIOS2 attributes are intended for constant metadata. ADIOS2 attributes have some features that remain useful also when metadata changes:

This feature request proposes to add these features to global single values.

Why is this feature important? From our perspective, openPMD metadata generally changes across steps. We are experimenting now with a new data schema for openPMD in ADIOS2 based on global single value variables for implementing openPMD attributes (seminal PR for these efforts) and it is our experience that this fixes many tricky edge cases when using ADIOS2 steps. However, @guj has noticed performance problems at large scale and we have related those to the above two missing features:

  1. Aggregation: Unlike with attributes, defining a variable on n ranks will lead to n instances of it, example executed with 14 parallel ranks:
    > bpls -D dataset.bp
    …
      double    /data/meshes/mymesh/unitSI            scalar
            step 0: 14 instances available
      uint64_t  /data/snapshot                        scalar
            step 0: 14 instances available
    …
  2. Small vectors. It's currently necessary to use array-formed variables for these. Treating them like attributes otherwise will lead to n blocks being written:
    > bpls -D dataset.bp
    …
      double    /data/meshes/mymesh/unitDimension     {7}
            step 0: 
              block  0: [0:6]
              block  1: [0:6]
              block  2: [0:6]
              block  3: [0:6]
              block  4: [0:6]
              block  5: [0:6]
              block  6: [0:6]
              block  7: [0:6]
              block  8: [0:6]
              block  9: [0:6]
              block 10: [0:6]
              block 11: [0:6]
              block 12: [0:6]
              block 13: [0:6]
    …

What is the potential impact of this feature in the community? Users can decide between constant and changing metadata with less worry for how expressive and performant this makes their data. This proposed change unifies the semantics of attributes and global single values, and clarifies the distinction between both concepts by "constant vs. changing". Is your feature request related to a problem? Please describe. See above. Describe the solution you'd like and potential required effort Mostly described already. API-wise, this would probably require a more explicit definition of global single values.

// today
adios2::Variable<uint32_t> varNodes = io.DefineVariable<uint32_t>("Nodes");
// extended, new
adios2::Variable<uint32_t> varNodes = io.DefineVariable<uint32_t>("Nodes", {adios2::GlobalValue, 7});

(Compare the existing use of adios2::LocalValueDim.) Optionally, distinguish global values more clearly from global arrays in the API to more clearly separate handling of metadata and actual data in user code (as is done today with the distinction e.g. between AvailableVariables and AvailableAttributes)

Effort depends on how reusable the metadata aggregation of attributes and how extensible global single values are implemented today. Might touch data formats too. Describe alternatives you've considered and potential required effort Our intermediate solution will be to add a mode to openPMD-api that allows us to assume that all attributes written from a rank other than 0 can be dropped. This does not work for all use cases (where datasets might be defined only on certain ranks) and a more general solution can only be implemented with an additional aggregation step on our side, since we don't know what our users are doing otherwise.

Additional context Discussed last week with @pnorbert and @ax3l.

stefurnic commented 4 months ago

What is the current status of series.setAttribute('some_attribute', value) with iteration_encoding=variable_based? I notice that it is not changing, only the first value is saved. Is there a way to have global changing attributes that may help to pinpoint in which iteration some data resides?

franzpoeschel commented 4 months ago

How to deal with changing metadata was a topic that took some time:

  1. The initial problem was exactly what you noticed now, attributes are (were) constant and cannot (could not) be changed
  2. We tried an alternative based on global single variables, i.e. defining our metadata in terms of variables and not using attributes at all. If you are using a stable release of openPMD-api, I believe that this implementation is still there, but I'm not sure. The dev has it removed; so if you created datasets with that, you should be aware that they will become unreadable.
  3. That attempt worked, but brought a whole bag of other problems because it circumvented the metadata aggregation system of ADIOS2, so we dropped that again.
  4. In reaction to this, ADIOS2 added modifiable attributes, which we now support on our dev branch. It seems I forgot documenting that feature, they are opt-in via adios2.modifiable_attributes = true.

Note that modifiable attributes are not "the full thing" since the attribute still exists only once, so there is still no association between steps and different values, but the updated value is for all steps.

For full support of random-access in variable-based encoding (see our discussion in https://github.com/openPMD/openPMD-api/issues/1611), we will probably need to bring back a lite-support for global single variables; still need to see how that will turn out.

On the original issue: since ADIOS2 now has modifiable attributes, this issue can be closed. Will do that after the end of this discussion.