Closed ax3l closed 3 months ago
@ax3l I tried using dask delayed with openpmd-api for parallelization over iterations. I couldn't get it to work, I think, because dask wasn't able to pickle the series object. Would it make sense to add add_pickle
to Series
and Iteration
?
@ax3l @franzpoeschel This also doesn't seem to work when working with one series at a time but more than one in a kernel instance. I don't understand why, but I'm trying to use dask
and iterate over multiple simulations. The result is that the data keeps being loaded from the first series! Deleting series in between or restarting dask
workers doesn't help. The only thing that worked for me in Jupyter was restarting the kernel between running the code for different series.
Are you aware of any workaround?
That's precisely what Axel means above by
This works as a hack until you need to work with two series at a time.
Unpickling e.g. a single RecordComponent
does not really work trivially together with our memory model in openPMD since a RecordComponent
will become invalid once its Series
is deleted, but the Pickle API gives us no way to store the Series
anywhere.
Ideally, we should change our C++ API to a model where any handle keeps the entire thing alive, this would also solve this issue. This should be possible, but would be a slightly larger change (I actually have PR open with an internal remodeling that might help here).
For now, this is what we do:
// Create a new openPMD Series and keep it alive.
// This is a big hack for now, but it works for our use
// case, which is spinning up remote serial read series
// for DASK.
static auto series = openPMD::Series(filename, Access::READ_ONLY);
... which leads exactly to the behavior that you see.
I do have an idea though how we could fix this short-term, lemme see
For the first implementation of multi-process (multi-node) Dask, we pickle objects like the Record and RecordComponents + their series.
The series is unpickled into a static function member, to avoid:
This works as a hack until you need to work with two series at a time.
https://github.com/openPMD/openPMD-api/blob/0.15.1/include/openPMD/binding/python/Pickle.hpp#L73-L77