Is there an example on using intake with AWS S3 netcdf files? #290

Thank you for intake-esm! I have used intake on locally stored netCDFand zarr data successfully in the past. I recently opened a netCDF dataset in AWS S3 directly with xarray after several failed attempts. I am now working with some model output in netCDF format publicly available in AWS S3. I was trying to get an intake-esm example on this data working for a quick demonstration of our newest experimental JupyterHub setup. I am running into an issue and I hope it's just something minor I overlooked. I was wondering if anyone else has tried using intake with S3 netCDF data source? I notice the NCAR CESM model has done something similar, but in zarr (in AWS S3).

Additional info can be found below. Please let me know if you'd like additional information. Any help is appreciated.

intake                    0.6.0                      py_0    conda-forge
intake-esm                2020.8.15                  py_0    conda-forge
<class 'intake_esm.source.ESMGroupDataSource'>
<name: output.NOAA-GFDL.GFDL-ESM4.atmos.historical.mon.Amon, assets: 1
ds = data_source(zarr_kwargs={'consolidated': True, 'decode_times': True}).to_dask()

Test code and additional files (referring to the intake-esm catalog examples for glade,etc) in github for reference ESM Collection file DB/metadata

I tried checking out and made some edits to be compatible with the current version, but it didn't help thus far.

@aradhakrishnanGFDL, Thank you for providing useful debugging information.... I think I have an idea of what's going on... It appears that the reason why accessing netCDF in S3 doesn't work has to do with some assumptions made in intake-esm... Here are the culprit lines:

When dealing with netcdf on S3, instead of calling fsspec.get_mapper(path, **storage_options), we need to call, **storage_options).... I will look into supporting this feature in the next release of intake-esm. I will ping you once I have a working prototype for this functionality...

@aradhakrishnanGFDL, when you get a chance, could you confirm that the following works for you:

In [17]: import xarray as xr

In [18]: import fsspec

In [19]: fs = fsspec.filesystem('s3', anon=True)

In [20]: x = 's3://gfdl-esgf/CMIP6/CMIP/NOAA-GFDL/GFDL-ESM4/historical/r1i1p1f1/Amon/tas/gr1/v
    ...: 20190726'

In [21]: root =

I am unable to successfully run line because it appears that the s3 bucket isn't public (maybe?)

Hi @andersy005,

Thank you for helping with this!

The bucket should be public. I appended the file name to the path and the following works for me. (Note: v20190726 directory has two netcdf files)

import fsspec fs = fsspec.filesystem('s3', anon=True)

one more example if needed x="s3://gfdl-esgf/CMIP6/AerChemMIP/NOAA-GFDL/GFDL-ESM4/histSST/r1i1p1f1/Amon/tas/gr1/v20180701/"

x="s3://gfdl-esgf/CMIP6/CMIP/NOAA-GFDL/GFDL-ESM4/historical/r1i1p1f1/Amon/tas/gr1/v20190726/" root =

I have a solution for you in #292 :). You will need to modify your path entry in the csv to point to an actual file instead of the directory:


To try #292 out, you can install intake-esm via

python -m pip install git+
In [1]: import intake

In [2]: col = intake.open_esm_datastore("gfdltest.json")

In [3]: dset_dict = col.to_dataset_dict(cdf_kwargs={'chunks': {'time': 20}}, storage_options={'anon':True}
   ...: )

@aradhakrishnanGFDL, I merged #292 into master. When you get a chance, could you try the master branch and let me know how it goes?

python -m pip install git+
Hi @andersy005 It works great! Thank you so much.