Open wqshen opened 3 years ago
It seems reasonable to allow open_mfdataset
to work without dask — do others who know better agree?
If so, what's the best approach? We could have a default of chunks={}
, accepting the python footgun, and then people can pass chunks=None
for no dask? Or use a sentinel value.
Recently, i use
open_mfdataset
to open a local tar.gz file of multiple netcdf files, it failed to open it and raise adistributed.scheduler.KilledWorker: Error
andTypeError: cannot serialize 'ExFileObject' object
.My code is like following,
In above code, the elements of variable
flist
will be type ofExFileObject
, which can't be serialized to distributed.Client cluster and therefore will result in the failure ofopen_mfdataset
.The reason is
xr.open_mfdataset
auto convertchunks=None
to{}
, which will force the methodxr.open_dataset
to use dask.We can see in this line of
open_mfdataset
,https://github.com/pydata/xarray/blob/37fe5441c8a2fb981f2c50b8379d7d4f8492ae19/xarray/backends/api.py#L897
Even if i set the chunks=None, it will be a error cause the
chunks
always not be None when it is passed intoopen_dataset
.I think maybe we can keep the chunks value and if anyone want change it, he or she can set it to
{}
or any other values as they want ?Or may you have a better solution for my problem ?
Also, Thank You for your great jobs on this excellent package.