openPMD / openPMD-api

:floppy_disk: C++ & Python API for Scientific I/O
https://openpmd-api.readthedocs.io
GNU Lesser General Public License v3.0
138 stars 51 forks source link

Option To Mask Invalid Regions with Zeros #1203

Open ax3l opened 2 years ago

ax3l commented 2 years ago

It would be great to either in ADIOS2 or here have an option to mask Selections (offset+extent) reads that are outside of available data index space (openPMD: availableChunks()) with a constant value, e.g., zero.

This would simplify implementations like: https://github.com/openPMD/openPMD-viewer/pull/332 for the mesh-refinement extension https://github.com/openPMD/openPMD-standard/pull/252

cc @guj @franzpoeschel

franzpoeschel commented 2 years ago

We should ask the ADIOS2 team how ADIOS2 treats undefined regions in reading. If those regions are just skipped, solving this could theoretically be done by zero-initializing the read buffer.

I would try not to roll our own solution for this as it would mean reimplementing many nontrivial things that are already done by ADIOS2:

ax3l commented 2 years ago

Discussed today: ADIOS does not write to undefined index regions in read.

We allocate memory (or a user passes it to us), so we can set this to a value like NaN or zero (make configurable) https://openpmd-api.readthedocs.io/en/0.14.4/_static/doxyhtml/classopen_p_m_d_1_1_record_component.html#ac31282d2109a693aa48e21a6f76fcb8f

ax3l commented 2 years ago

@lucafedeli88 tried using today the direct ADIOS2 Python Numpy bindings (2.7.1 and also 2.8?) - selecting the whole region of a refinement level there shows undefined (scrambled, non-zero) values outside the written region.

This makes me wonder of ADIOS2's read routines fill unwritten index areas really with zero, or if that is just an issue with the numpy bindings of ADIOS2 @pnorbert.

franzpoeschel commented 2 years ago

Isn't this expected behavior? ADIOS2 does not fill unwritten index areas with zero, it entirely ignores them. So, for instance if you use the std::shared_ptr<T> loadChunk(…) overload, then the memory will get allocate, but noone will ever write to it, so you get random nonsense at read time.

franzpoeschel commented 2 years ago

Also, I'm hesitant to initialize the buffer with zero in that line, as it's a costly operation that most users won't need. If you want a buffer to be filled with data everywhere that there is data, and zero otherwise, I'd say that's a rather application-specific requirement and relatively simple to manually emulate in two lines:

std::shared_ptr<float[]> chunk{new float[10]{0}};
E_x.loadChunk<float>(std::static_pointer_cast<float>(chunk), {0}, {10});
ax3l commented 2 years ago

Write: absolutely, there are no regions and they should not be filled.

Read: Undefined regions should maybe be explicit zero or NaN instead of UB in the ADIOS2 Python bindings?

pnorbert commented 2 years ago

Well, numpy has functions to allocate arrays with initialization, like np.zeros, np.ones, np.full if someone wants to do that. As Franz explained, adios does not touch memory cells that has no incoming data.

ax3l commented 2 years ago

Yes, but it's the [] operator that causes this in your bindings already. @lucafedeli88 can you post your example from yesterday here?