ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
268 stars 125 forks source link

adios2 python bindings like adios1 #2311

Closed rmchurch closed 10 months ago

rmchurch commented 4 years ago

This is a general inquiry, not necessarily a feature request (though I have a strong preference). I may be missing functionality (please let me know if there is a way with ADIOS2 to do this but I've missed it), but why were the Python bindings in ADIOS2 moved away from the more general API used by ADIOS1, and by h5py? In particular, the ability to read subsets are very useful:

fh = ad.open(filename,'r')
data= fh['varname'][100:1000]

Perhaps this is still possible in adios2, but the high-level API doesn't seem to expose it? Also, the lazy execution (i.e. datahandle = fh['varname']) is useful in some scenarios where you pass around datahandles, and read only subsets or delay reading the actual data.

Moreover, indexing certain timesteps is often of interest. The sequential for loop to step through data in ADIOS2 is somewhat orthogonal to the way I would usually do things in Python. If I have data [Nx,Nt], in ADIOS2 now each Nt slice is in a timestep that I have to for loop through. Normally in Python its considered faster to avoid for loops; in ADIOS1 I would have read all the data at once: data = fh['varname'][...].

My personal preference is the way the ADIOS1 and h5py Python bindings, it feels more Pythonic, if its still possible with ADIOS2 to have that functionality I would be interested.

williamfgc commented 4 years ago

@rmchurch thanks for the feedback, I can comment on that. The first thing to understand is that adios2 APIs are streaming oriented as oppose to hdf5 and adios1 which reuse the numpy style you mention. In addition, adios2 tracks the "step" of a variable for the user, using hdf5 it would be a new variable or an extra dimension with the trade-off associated with it. adios2 covers very different use-cases that the numpy syntax was not designed for.

Therefore, the adios2 Python high-level API is to facilitate the streaming use-case (code coupling, in-situ and writing analysis data from a simulation) with a single handler for a flat learning curve. The Python low-level API matches the C++ API almost 100% so developers don't need to think too much about new concepts when transitioning. Use that API for performance and lazy evaluation.

That being said, let's take a look at your example in the adios2 Python high-level API context:

fh = ad.open(filename,'r')
data= fh['varname'][100:1000]

are we reading steps 100 to 1000 or it's a slice? If it's a slice, of which block (for local variables) of which step? I'm not aware if h5py and numpy track step indexing using this syntax. To avoid ambiguity, the initial preference is to be explicit and clear and overload a single read function, see docs here:

file-only read: np_array_2steps = fh.read(variable_name, start=[0,0], count=[10,10], step_start=0, step_count=1)

streaming read (portable across all engines): np_array_current_step = fh.read(variable_name, start=[0,0], count=[10,10])

That being said, any attempt to make it more numpy-like will build on top of the above, but more importantly it will only work with adios2 file engines, not necessarily for streaming (unless I'm missing something). Hope it helps as one of the challenges is to have simple APIs that would work for file (random-access across all steps) and streaming access (one step at a time). The other option is to have different APIs for each concept, that would be going back to adios1 and having to write special code for each engine.

williamfgc commented 4 years ago

At a much higher level adios2 semantics attempts to match how data is produced and consumed in HPC simulations based on MPI. Hence, a physical 2D field doesn't necessarily has to be presented as a 3D array to track I/O steps. adios2 does the boilerplate internally. I particularly like the semantics @pnorbert introduced in bpls. 10*(2,5) is more clear and closer to the physics than (10,2,5) as separates steps from physical dimensions (is it a 3D field or 2D+time?). I'd argue that adios2 API are pythonic in the "steps" sense reusing for step in stream but, you are right, less numpy-like in the dimensions aspect.

rmchurch commented 4 years ago

Thanks for the explanation of the motivation for the API style (focused around streaming), that helps a lot to understand the choices made. When I wrote this, it was with the use-case in mind of analyzing files from a simulation after the simulation is completed, so I can see how it may not completely fit with the streaming view. But I think its an important use case. And I would argue the normal use case in post-processing analysis does not necessarily match how HPC simulation data is produced, e.g. often for time steps, you want to read the entire data in at once, and visualize in a contour plot, then do further processing on subsets, etc. These cases the 3D array is more natural, and you have to keep track of which dimension is the time step.

However, that issue to me isn't as important as the numpy-like indexing and slicing available in adios1 and h5py. Those are particularly useful, not only for the slicing I referred to above, but also for example in an XGC simulation that produces data on a triangulated mesh, and so come in a 1D array (and often with different files written out per timestep). Often I will want to read in a subset in physical space of the data (e.g. a "ring" of data, and I have a radial-like coordinate (psi) that I can use to produce a list of indices that match my criteria, e.g. inds = np.where(psin>=0.9)[0]. Normally I would then use this in adios1 as fh['varname'][inds]). From the API link you gave, it seems that I can accomplish that passing in the inds list to start, and then a list of ones to count, but this seems more cumbersome than the encapsulation of numpy-like slicing/indexing provides. In fact, I believe the numpy indexing actually goes through the inds list and joins consecutive accesses for speed (I think I had a github issue in the adios1 repo about this, I'd have to look).

I don't think it would be too hard to keep the numpy-like indexing away from non-file engines (in the def index subroutine just have a check on the file handle to determine engine, and throw exception if not file. Though I could think of scenarios where you may want to index the data sent in a timestep also, so some limited functionality could be allowed).

Anyways, my two cents. If its possible to have the numpy-like indexing/slicing jive with the API, I think it would be useful. Theres a lot of great improvements that have come with adios2 (building now, and building Python bindings, is a dream with the cmake system), but I do miss those old numpy-like indexing capabilities.

williamfgc commented 4 years ago

@rmchurch those are fair point points, let me provide some context:

These cases the 3D array is more natural, and you have to keep track of which dimension is the time step.

Many simulations only allocated space dimensions and reuse the memory in every step. I'd say that depends if you have enough memory available and the size of the variables so you could pre-allocate all steps at once. You're right as both are common use cases as it depends if the process is memory-bound or CPU-bound. Memory has become more relevant to track in exascale as accelerators can only hold so much.

XGC simulation that produces data on a triangulated mesh, and so come in a 1D array (and often with different files written out per timestep)

The adios2 BP4 default engine was designed to stay away from this pattern (opening files too often can be expensive on Summit, we saw this in another app and Open is run on the background by default). I wonder if this is an opportunity for migrating that writer to BP4.

often for time steps, you want to read the entire data in at once,

The default is to read the first step (so it's seamless to data with a single step). While it's natural for your post-processing task, I'd argue that when dealing with Gb or Tb of data (which adios2 deals with other apps) that could easily get adios2 readers into allocation issues. Ultimately, that was the reasoning for not making it the default behavior (I wonder how h5py handles this at those scales). A solution would be to have a domain-specific wrapper around XGC post-processing readers in which this is the default behavior. Today, the user must be explicit (that's not an issue with numpy or h5py as they don't handle steps over separate arrays in their semantics). my_all_steps_nparray = read( var, steps_start=0, steps_count=nsteps )

Often I will want to read in a subset in physical space of the data (e.g. a "ring" of data, and I have a radial-like coordinate (psi) that I can use to produce a list of indices

Thanks for bringing this up. The adios2 Python APIs don't have queries, this is a good time to check with @pnorbert about adding those. Currently, the high-level API only track steps and dimensions with a generic read function.

However, that issue to me isn't as important as the numpy-like indexing and slicing

That should be straight-forward built on top of the read function overloads, but won't match 1:1 the h5py and numpy semantics as what they call "step" is very different from what adios calls "step" (they are neither MPI-block, nor streaming based).

fh[var][50:100:4]. is this 50,54,58, ...100 or a [50,100] selection for the first 4 adios2 steps? My point is that a lot more discussion is needed to fit adios2 semantics into a different conceptual paradigm.

Overall, the current adios2 high-level Python API is super-simple and more streaming oriented. You raised good points for the file-based post-processing scenario that h5py is designed for without the burden of tracking steps or MPI-based blocks. Hope it helps, thanks.

eisenhauer commented 10 months ago

Closed as inactive.