ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
268 stars 125 forks source link

StatsLevel: Improve Write Performance #2880

Open ax3l opened 2 years ago

ax3l commented 2 years ago

We like the ADIOS2 StatsLevel feature for queries and speeding up reads.

Unfortunately, the current implementation is pretty slow at write-time, adding significant write time overhead (~10% last time we checked) when enabled.

In order to make this feature production-ready, one should investigate if some kind of acceleration of the compute part, e.g., improved vectorization, OpenMP threading (for large enough data), asking the user for certain stats, etc. can be applied. Potentially and orthogonally, one can also try to hide gather latency by gathering asynchronously to other operations.

cc @pnorbert @guj @sklasky @dmitry-ganyushin @lwan86

Related to:

This is not needed this month but should be addressed generally. This might be a low-hanging fruit (optimize the compute performance of some simple functions that gather the stats) that has good production impact for full cycle workflows in apps.

Needed by @ax3l and @franzpoeschel for openPMD (ECP WarpX & PIConGPU)

guj commented 2 years ago

For the record,

Here are the I/O time of two runs I had on summit, one without stat and one with stat. 4TB.

orders are [file/group/variable] based:

without stat: 35.46/34.5/29.96 with stat: 36.11/35.18/35.07

min/max in profiler took around 5 seconds.

ax3l commented 2 years ago

Thanks! So also around 10-15% overhead.

williamfgc commented 2 years ago

I remember using C++ threads for getting min/max. It should be implemented.

guj commented 2 years ago

I remember using C++ threads for getting min/max. It should be implemented.

Yes it is there. I just read the code. Possibly can pass the thread parameter to increase the thread level, and that will make things faster.

williamfgc commented 2 years ago

Hi @guj , yes, it was also tied to std::copy/std::memcpy when copying to buffer, but the largest benefit was for min/max. @ax3l it would be a matter of measuring again to see if there is benefits depending of data size and platform. Hope it helps.