Closed aveksler1 closed 4 months ago
Thank you for the details!
We chatted further on Slack and using HDF5 with group-based encoding should be way faster. I suspect that we accidentally close and open the same file all the time in openPMD-viewer - let's debug this!
In the meantime, you might be faster if you read with openPMD-api for now until we fix this here: https://openpmd-api.readthedocs.io/en/0.15.1/usage/firstread.html
I have the same problem
And also If I'm iterating over a lot of files with ts.get_field
it says python errno = 24: Too many open files
I have the same problem And also If I'm iterating over a lot of files with
ts.get_field
it sayspython errno = 24: Too many open files
Does this still happen when you close Iterations with Iteration::close()
? If you don't, you will at some point run into that problem.
Note that we (currently) don't support re-opening an Iteration once closed, but it's on the agenda.
EDIT: Aah, just noticed that this is the openPMD-viewer. Yep, for that one we will probably need the functionality described above..
Will look at the rest of the issue tomorrow
Here are the major ways how people iterate over data in openPMD-viewer:
from openpmd_viewer import OpenPMDTimeSeries
ts = OpenPMDTimeSeries('./example-2d/hdf5/')
N_iterations = len(ts.iterations)
for i in N_iterations:
rho, info_rho = ts.get_field( iteration=ts.iteration[i], field='rho' )
# or more fields or particles
Same used with explicit iterations in the GUI https://openpmd-viewer.readthedocs.io/en/latest/tutorials/3_Introduction-to-the-GUI.html
See #400 https://openpmd-viewer.readthedocs.io/en/latest/tutorials/4_Particle_selection.html#Reconstructing-particle-trajectories https://openpmd-viewer.readthedocs.io/en/latest/api_reference/generic_interface.html#openpmd_viewer.OpenPMDTimeSeries.iterate
Note: after a first iterate, a user might want to do this again on the same time series. There are currently some limitations in series.close()
about this @franzpoeschel .
Let's discuss if we can generalize this and potentially also add an iteration filter mode. For large processing tasks, this mode would allow for way more optimizations (e.g., visiting iterations once, freeing resources) than the explicit access in the example above.
I created a small dataset to check things out. on frontier Total is 100G for 500steps. /lustre/orion/csc303/scratch/junmin/oct2023/Test/issue380/ With the same input, I stored in file based mode and group based mode, all in BP5
Using Remi's script , output reading file based mode: Processing file f/defaultBP5-node-ews_diags/diag1/ Time to get fields: 0.3335 s Time to get fields: 0.3615 s Time to get fields: 0.4046 s Time to get fields: 0.3177 s Time to get fields: 0.2831 s Time to get fields: 0.3511 s Time to get fields: 0.3737 s Time to get fields: 0.3670 s Time to get fields: 0.3218 s Time to get fields: 0.3098 s Time to get fields: 0.3574 s Time to get fields: 0.3913 s Time to get fields: 0.3470 s Time to get fields: 0.2941 s Time to get fields: 0.4090 s Time to get fields: 0.3446 s Time to get fields: 0.3648 s Time to get fields: 0.3623 s Time to get fields: 0.2748 s Time to get fields: 0.3575 s
output with group based mode: Processing file g/defaultBP5-node-ews_diags/diag1/ Time to get fields: 0.0535 s Time to get fields: 0.0020 s Time to get fields: 0.0018 s Time to get fields: 0.0017 s Time to get fields: 0.0009 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s Time to get fields: 0.0007 s
So I don't observe the pattern Remi observed. Let me try BP4 next
I think Remi used BP4 with file encoding. group encoding is stable.
Processing file bp4-f/defaultBP4-node_diags/diag1/ Time to get fields: 0.0303 s Time to get fields: 0.0286 s Time to get fields: 0.0307 s Time to get fields: 0.0293 s Time to get fields: 0.0281 s Time to get fields: 0.0281 s Time to get fields: 0.0323 s Time to get fields: 0.0302 s Time to get fields: 0.1664 s Time to get fields: 0.0298 s Time to get fields: 0.3176 s Time to get fields: 0.0483 s Time to get fields: 0.1922 s Time to get fields: 0.0339 s Time to get fields: 0.1047 s Time to get fields: 0.0860 s Time to get fields: 0.0996 s Time to get fields: 0.0559 s Time to get fields: 0.3539 s Time to get fields: 0.2703 s
Processing file bp4-g/defaultBP4-node_diags/diag1/ Time to get fields: 6.3459 s Time to get fields: 0.3049 s Time to get fields: 0.3861 s Time to get fields: 0.3260 s Time to get fields: 0.2982 s Time to get fields: 0.3007 s Time to get fields: 0.3196 s Time to get fields: 0.3372 s Time to get fields: 0.3312 s Time to get fields: 0.3315 s Time to get fields: 0.3551 s Time to get fields: 0.3273 s Time to get fields: 0.3156 s Time to get fields: 0.3091 s Time to get fields: 0.3020 s Time to get fields: 0.3402 s Time to get fields: 0.4501 s Time to get fields: 0.4256 s Time to get fields: 0.3838 s Time to get fields: 0.3506 s
So for the time being, switch to BP5 would be the easiest workaround.
For fast performance, group based would be better than file based.
Would be good to add read support of variable based. encoding in the openPMD-viewer. I think the performance should be similar to the group based.
@guj I'm not sure if you're really measuring a slowdown and not just slight performance fluctuations:
@aveksler1 This is obviously difficult to debug from afar, but let's try. Please do only steps 1-3 of this so far since those might already confirm the cause that I think most likely:
simData_0.bp
and simData_2000.bp
) and show the output of bpls -D simData_<timestep>.bp
as well as ls simData_<timestep>.bp
? export OPENPMD_VERBOSE=1
? If the output gets too large, maybe filter especially for OPEN_FILE
and CLOSE_FILE
, but ideally just redirect the verbose output to some file and upload it. If you don't use openPMD-api 0.15.2, maybe upgrade and then do that.series.flush()
was done inside Python. If you really want to go all the way, a performance analysis of the C++ side would be interesting. The most useful results are normally produced by Google Perftools. They don't require recompilation of the profiled code, but the installation of the package might be troublesome, so skip this if it's too much work.There are two possible reasons that I currently see for this slowdown:
close()
ing iterations and this somewhat badly affects the ADIOS2 backend for whatever reason.@franzpoeschel the plot time you have is for group based encodings. The time from bp4 and bp5 of both file/group encodings are in the plot below.
Here are the major ways how people iterate over data in openPMD-viewer:
Time Series Explicit Access
from openpmd_viewer import OpenPMDTimeSeries ts = OpenPMDTimeSeries('./example-2d/hdf5/')
N_iterations = len(ts.iterations) for i in N_iterations: rho, info_rho = ts.get_field( iteration=ts.iteration[i], field='rho' ) # or more fields or particles
Same used with explicit iterations in the GUI https://openpmd-viewer.readthedocs.io/en/latest/tutorials/3_Introduction-to-the-GUI.html
Iterate
See #400 https://openpmd-viewer.readthedocs.io/en/latest/tutorials/4_Particle_selection.html#Reconstructing-particle-trajectories https://openpmd-viewer.readthedocs.io/en/latest/api_reference/generic_interface.html#openpmd_viewer.OpenPMDTimeSeries.iterate
Note: after a first iterate, a user might want to do this again on the same time series. There are currently some limitations in
series.close()
about this @franzpoeschel .Let's discuss if we can generalize this and potentially also add an iteration filter mode. For large processing tasks, this mode would allow for way more optimizations (e.g., visiting iterations once, freeing resources) than the explicit access in the example above.
Given that the openPMD-viewer is in this context a wrapper about the openPMD-api, some optimizations are probably possible today already, like closing an iteration when the next one is accessed. The downside is that an iteration once closed can currently not be reopened again, which can be internally worked-around by closing and reopening the Series (ideally with {"defer_iteration_parsing": true}
to avoid an overhead from that).
The access patterns themselves are probably fine, it's more a question of how they are implemented in the openPMD-viewer (or in future: in the openPMD-api).
As discussed offline, one important next step for the openPMD-api is a better support for typical access patterns, with such workarounds ideally being moved into the openPMD-api. My idea is that depending on the access configuration, you get an iterator from Series::readIterations()
that has different features sets. When using Access::READ_ONLY
(= random access), it would mostly be equivalent to using Series::iterations
. Using READ_LINEAR
would give you an iterator that accesses "one iteration at a time", i.e. jumping from ADIOS step to ADIOS step, jumping from file to file. In that mode, optimizations like closing files before opening the next, or even a cache of open files with some pre-loading might be thinkable.
In the end, this would ideally close the gap we have today between the random-access API (which is too permissive) and the streaming API (which is too restrictive).
While we generally think the solution will be to go to BP5, we currently see that file based encoding in BP5 is slower than BP4 in the benchmarks above.
@guj and @pnorbert are investigating further.
Issue #400 is a smaller reproducer of this issue, that contains HDF5 and ADIOS2 files.
@aveksler1 if you still have the issue, can you profile your code like this? https://github.com/openPMD/openPMD-viewer/issues/400#issuecomment-1821318604
The general solution is to use groupBased
(g) or even variableBased
(v) encoding with BP4 and BP5.
https://warpx.readthedocs.io/en/latest/usage/parameters.html#diagnostics-and-output
<diag_name>.openpmd_backend = bp
<diag_name>.openpmd_encoding = g # soon: v
We plan for openpmd-api 0.16+ and according WarpX releases to go from BP4 to BP5 and to go from groupBased to variableBased encoding.
@ax3l I'll run test cases with h5 and bp backends and different encodings, 'f' and 'g'. I'll compare the timings results from consecutive calls to ts.get_field()
. If I still have the issue, I'll profile the code.
I have found the main cause for this slowdown. In ADIOS2IOHandler::flush()
, a call to m_dirty.clear()
is missing, meaning that every ADIOS2 file is flushed each time again.
I have added a fix as part of https://github.com/openPMD/openPMD-api/pull/1598.
openPMD-api is currently not really well-optimized for Series with many iterations (my personal focus has been more on Series with few very big iterations), so researching this bug uncovered not only one single performance issue. The linked PR is hence not yet ready, as it contains other (less trivial) fixes, and additionally there are some weird findings that I still need to have a look at.
Until that PR is merged, a workaround might be adding ts.data_reader.series.iterations[it].close()
in the reader code once you don't need the iteration any more. Note that you won't be able to access the iteration any more as there is no re-opening logic (yet).
Thank you so much! I was able to avoid the problem by only using h5 output, but having the ability to rapidly post-process ADIOS2 files will be a great help when we start running simulations that benefit from ADIOS2 output.
Performance analysis:
Action items:
openPMD-api
: fix internal regressions https://github.com/openPMD/openPMD-api/pull/1598openPMD-api
: enable to .close()
and reopen an iteration (in both random access and .readIteration
/linear read) https://github.com/openPMD/openPMD-api/issues/1606openPMD-viewer
: close out steps (e.g., in ts.iterate()
or after we opened N further steps in random access mode)Per https://github.com/openPMD/openPMD-viewer/issues/380#issuecomment-1992081339
The specific issue is closed and will be released with openPMD-api 0.16.0.
Generally, the overhead of large series with many iterations in open should be linear. Please continue to report issues if you see otherwise :pray:
Hi all,
I'm using
openPMD-viewer
to loop through the outputs of a WarpX simulation and perform a calculation on fields at every iteration. I noticed that as I loop throughts.iterations
, theget_fields()
call takes longer and longer. I'm running this script on a local cluster, and thets.backend
isopenpmd-api
.The script where I am seeing slowdown looks like this. I am not able to recreate the problem using the openPMD example datasets. The dataset I am using (which I unfortunately cannot share) are 4000 iterations of E and B fields on a 72x72x144 grid. Every .bp directory is ~ 35 Mb, so around 140 Gb in total.
This outputs:
and this scales worse and worse as I increase the length of
iterations
.I tracked down the offending function to be
get_data
in io_reader/utilities, specifically theseries.flush()
call repeated for every chunk. Timing it shows it takes up the majority of thets.get_fields()
call.I don't see this behavior when the openPMD outputs are .h5 files (but still openpmd-api backend). Any help would be much appreciated!