ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
266 stars 122 forks source link

API Docs/Cheet Sheet: Collective, Blocking and Memory Contracts #1268

Open ax3l opened 5 years ago

ax3l commented 5 years ago

Hi,

we are currently sitting in the ADIOS2 tutorial at TU Dresden and as with ADIOS1, the following would be extremely useful:

A cheat sheet (or API documentation) of all public API calls (engine open/close, defines, step begin/end puts) that documents for each of those:

I personally like a lot about ADIOS1, that most calls on variables are local, non-blocking and not even collective in the MPI sense, but just explicitly documenting this will make starting with ADIOS2 for folks that already know MPI (everyone in HPC) so much easier.

williamfgc commented 5 years ago

@ax3l point taken. Thanks for the feedback as usual. In the meantime, have you had the chance to look at https://adios2.readthedocs.io/en/latest/ ? Would the section on language API bindings (uses doxygen) help? We are always improving the docs based on applications feedback. Thanks!

ax3l commented 5 years ago

Thanks, we enjoy the staging demos very much over here.

https://adios2.readthedocs.io

Argh, I have overlooked that in the README. Do you want to link it in the repo URL (next to the repo description on GitHub) just to make it more prominent? Will dig in.

williamfgc commented 5 years ago

@ax3l good idea. Feel free to provide feedback if the docs are not clear (I will add the cheat sheet on MPI-related calls), all the https://adios2.readthedocs.io sources live under ADIOS2/docs and we use the breathe package to pull doxygen API info from C++ and C headers under ADIOS2/bindings. I haven't found a good way to automate Python and Fortran APIs due to their not so friendly nature with doxygen.

germasch commented 5 years ago
  • is it MPI collective or local?

Let me second this particular point, as that is something that I'm still not clear on. The only thing I've seen in the docs is

Always pass MPI_COMM_SELF if an Engine lives in only one MPI process. Open and Close are collective operations.

Which makes sense, but I'm sure that's not all that's collective. In particular, how about PerformPuts()? (I'd expect that to be collective) And a simple Put()? (I'd expect that to be local for, e.g. BP3, for sure, but I'm much less sure what happens if you're using some M procs -> N files mapping?)

williamfgc commented 5 years ago

@germasch @ax3l this is where the virtual nature of the abstract Engine functions becomes relevant. The only absolute functions that are always collective and common to all Engines are the ADIOS constructor/destructor (this is something @germasch introduced), and Open/Close. Each Engine will decide the nature of the virtual calls. In BP, by default, EndStep is collective, but certain parameters makes it non-collective (CollectiveMetadata =OFF) or partially-collective (CollectiveMetadat=OFF and M-to-N), or some collective some not (buffer a certain number of steps), also buffer size might make certain Put operations partially-collective, but they are mostly local (including PerformPuts). For BP3 parameters: https://adios2.readthedocs.io/en/latest/engines/engines.html#bp3-default Other engines act differently according to their parameters, but sticking to the same API and memory contracts.

Keep in mind, the main difference between ADIOS1 and ADIOS2 is that ADIOS2 is a framework abstraction with key/value parameters that can alter the internal nature of the Engine virtual functions to improve performance. For the most part, IO (except Open), Variable, Attribute and Operator functions are local. Each Engine needs to document its abstraction and parameters to concretize for their use cases.

williamfgc commented 5 years ago

@germasch @ax3l since you guys are physicists and comfortable with numerical solvers, we can make the analogy to PETSc KSP interface....each solver/engine allows for a set of parameters (e.g. Preconditioners, accuracy, norm type) to relax, optimize or improve the solution based on the nature of your problem (e.g. matrix type, partition used, communication,...). ADIOS2 is no different is that regard, Engines need better docs, though. Hope this helps.

germasch commented 5 years ago

Thanks, that makes sense. As you say, it'd be good to have this document, in particular cases where, e.g., a Put() is usually not collective but may sometimes be if, say, a buffer is full.

ax3l commented 5 years ago

Thanks for the details!

Yes, the idea is basically for users to know what they can write in an if and what not, because it might block which leads to deadlocks or it might be collective in some cases. Examples are ranks that contribute to the engine's MPI communicator but add zero-data (e.g. because they temporarily only model a vacuum without particles in an MD-sim, etc.).

Similar to C++ noexcept/throw() the user must somehow be made aware about what is safe to put in a branched execution and what not.

ax3l commented 5 years ago

1286 addresses MPI questions, thanks a lot!

Maybe, if I haven't missed it, the memory contracts could be described on Put/Get, Begin/EndStep and Perform* in detail as well?

williamfgc commented 5 years ago

@ax3l https://adios2.readthedocs.io/en/latest/components/components.html#engine-api-functions I'll try to make this section more clear and improve the narrative on memory contracts. The API docs, too. Thanks!

williamfgc commented 5 years ago

Readthedocs have now a section for Put and Get memory contracts for the pointer (address) and data contents: https://adios2.readthedocs.io/en/latest/components/components.html#put-modes-and-memory-contracts

https://adios2.readthedocs.io/en/latest/components/components.html#get-modes-and-memory-contracts

The are similar to C++ deferred launch mode https://en.cppreference.com/w/cpp/thread/launch