intake / intake-esm

An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
https://intake-esm.readthedocs.io
Apache License 2.0
138 stars 47 forks source link

Frequent dask UserWarning #596

Closed dougiesquire closed 1 year ago

dougiesquire commented 1 year ago

I've been getting the following warning quite frequently when opening data with intake-esm with an active distributed client:

dask/base.py:1368: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.

The warning originates from here: https://github.com/intake/intake-esm/blob/main/intake_esm/source.py#L249. Admittedly, it seems to come about when I'm trying to combine assets that are hard to combine (e.g. coordinate variables missing from some assets). Regardless, the warning suggests that it might not be a good idea to hard-code the single-threaded scheduler when many (most) users will already have an active distributed client.

Version information: output of intake_esm.show_versions()

Paste the output of `intake_esm.show_versions()` here: INSTALLED VERSIONS ------------------ cftime: 1.6.2 dask: 2023.3.2 fastprogress: 1.0.3 fsspec: 2023.3.0 gcsfs: None intake: 0.6.8 intake_esm: 0.0.post1096+dirty netCDF4: 1.6.0 pandas: 1.5.3 requests: 2.28.2 s3fs: None xarray: 2023.3.0 zarr: 2.14.2
dougiesquire commented 1 year ago

Things also run much slower when this warning is thrown. Simply removing the with statement makes things run much faster in these cases. @andersy005, could you please explain why the with statement is necessary? Is there any issue with removing it?

Thomas-Moore-Creative commented 1 year ago

I'm having similar issues - seeing this warning on attempting to use an intake-esm catalog to search across a large collection of CMIP6 datasets . Have tried to break the catalog search down all the way to a single file path ( rather than the large search I require ) and have also used xarray_open_kwargs = {"drop_variables": [..... to remove any of the extra, unneeded variables. I have even removed ALL the variables from this single file (leaving me with zero data variables) and I still get this warning on opening the search that points to this single file path?

Appreciate all the work from many on intake-esm!