mpiannucci / gribberish

Read GRIB files with Rust
MIT License
49 stars 2 forks source link

Kerchunked dataset unstable for distributed dask arrays #30

Closed mpiannucci closed 1 year ago

mpiannucci commented 1 year ago
Screenshot 2023-07-07 at 11 16 21 AM

Running this code after loading in the multizarr will give a different result over and over again. When the same code is run against the single zarr, there is no such instability.

EDIT: Nevermind happens with single zarr too :(

There is probably about a 50% hit rate with getting garbage or correct data

mpiannucci commented 1 year ago

This only happens when loading the dataset in as a dask array, aka when setting chunks: {'somthing': 0} on dataset load.

When loaded in as numpy arrays its stable.

mpiannucci commented 1 year ago

This issue also doesnt seem to happen with cfgrib, so likely an encoding or serialization issue somewhere along the line with dask

mpiannucci commented 1 year ago

This happens when the xarray caches from the dask client but the dask client doesnt have the gribberish codec. So need to test after getting the module installable on dask workers

mpiannucci commented 1 year ago

gribberish codec now works as it should, still saw some instability tho, but not sure what is cache and what is not

mpiannucci commented 1 year ago

This is still happening :(

mpiannucci commented 1 year ago

The error here is that one of the chunks for some reason isnt getting written to the kerchunk file, so no error is thrown. When the chunk is manually added back it everything works