Open mgrover1 opened 1 year ago
I've tried to reproduce this with some of our ODIM_H5 data with similar outcome.
@mgrover1 I've tried to track that down now using some GAMIC source data from our BoXPol radar.
In the normal case I get the above shown white spaces in the task graph.
If I remove the additional lines from the gamic open_dataset
-function after store_entrypoint.open_dataset
:
the call to open_mfdataset
returns without triggering any dask-operation.
Only if I .load()
, .compute()
or otherwise trigger a computation(eg.plotting), the files are accessed and the data is loaded and processed.
That leads to the task graph's as shown below:
One Timestep Single Moment of 15 (time: 12, azimuth: 360, range: 750):
All Timesteps Single Moment of 15 (time: 12, azimuth: 360, range: 750):
Compute the whole thing:
So as a consequence we might need to make sure no immediate dask-computations are triggered before actually doing something with the data. Would it make sense to create a test repo for that?
Yeah, let's create a test repo to try this out - this is promising! We can take a look at more testing/establishing some benchmarks to dig in here.
Maybe xradar-benchmark?
Description
We should take a look at how we can speed up the xarray backends, and if there are more levels of parallelization possible.
I wonder if upstream enhancements of xarray https://github.com/pydata/xarray/pull/7437
Might help with this, enabling us to plug in the io directly/benefit from more parallelization here.
What I Did
I read the data the following code:
Which resulted in this task graph, where the green is the
open_dataset
function.Which has quite a bit of whitespace/could use some optimization.