pangeo-forge / staged-recipes

A place to submit pangeo-forge recipes before they become fully fledged pangeo-forge feedstocks
https://pangeo-forge.readthedocs.io/en/latest/
Apache License 2.0
39 stars 63 forks source link

Recipe for iHESP Global Datasets #72

Open rabernat opened 3 years ago

rabernat commented 3 years ago

Source Dataset

iHESP is focused on high-resolution, coupled climate simulations spanning the entire globe and regionally downscaled simulations of a region of interest (ex: Gulf of Mexico). Our global climate datasets have been generated using a high‐resolution configuration of the Community Earth System Model version 1.3 (CESM1.3), with a nominal horizontal resolution of 0.25° for the atmosphere and land models and 0.1° for the ocean and sea‐ice models. At these resolutions, the model permits tropical cyclones and ocean mesoscale eddies, allowing interactions between these synoptic and mesoscale phenomena with large‐scale circulations.

Transformation / Alignment / Merging

The data should merge cleanly.

Output Dataset

Zarr with all time slice files unified into a single timeseries. Possibly merging of different variables as well.

cc @paigem, @abishekg7

jbusecke commented 2 years ago

Reporting back from the OSM tutorial, where @selipot, Ian Carroll, and I tried our luck with this.

We unfortunately failed to even open datasets from within the binder environment. Using the example url https://datahub.geos.tamu.edu:8880/thredds/fileServer/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.SSH.032001-032912.nc, which I can download to my local computer by clicking it, but was unable to load into xarray:

path = 'https://datahub.geos.tamu.edu:8880/thredds/fileServer/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.SSH.032001-032912.nc'
xr.open_dataset(path)

gives:

Note:Caching=1
Error:curl error: SSL peer certificate or SSH remote key was not OK
curl error details: 
Warning:oc_open: Could not read url
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    198             try:
--> 199                 file = self._cache[self._key]
    200             except KeyError:

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key)
     52         with self._lock:
---> 53             value = self._cache[key]
     54             self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://datahub.geos.tamu.edu:8880/thredds/fileServer/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.SSH.032001-032912.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
/var/folders/_1/1k9jtjl51z333f21s7yht0340000gn/T/ipykernel_1917/1179327591.py in <module>
----> 1 xr.open_dataset(path)

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    493 
    494     overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495     backend_ds = backend.open_dataset(
    496         filename_or_obj,
    497         drop_variables=drop_variables,

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    548 
    549         filename_or_obj = _normalize_path(filename_or_obj)
--> 550         store = NetCDF4DataStore.open(
    551             filename_or_obj,
    552             mode=mode,

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    377             netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    378         )
--> 379         return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
    380 
    381     def _acquire(self, needs_lock=True):

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in __init__(self, manager, group, mode, lock, autoclose)
    325         self._group = group
    326         self._mode = mode
--> 327         self.format = self.ds.data_model
    328         self._filename = self.ds.filepath()
    329         self.is_remote = is_remote_uri(self._filename)

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in ds(self)
    386     @property
    387     def ds(self):
--> 388         return self._acquire()
    389 
    390     def open_store_variable(self, name, var):

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in _acquire(self, needs_lock)
    380 
    381     def _acquire(self, needs_lock=True):
--> 382         with self._manager.acquire_context(needs_lock) as root:
    383             ds = _nc4_require_group(root, self._group, self._mode)
    384         return ds

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/contextlib.py in __enter__(self)
    117         del self.args, self.kwds, self.func
    118         try:
--> 119             return next(self.gen)
    120         except StopIteration:
    121             raise RuntimeError("generator didn't yield") from None

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/file_manager.py in acquire_context(self, needs_lock)
    185     def acquire_context(self, needs_lock=True):
    186         """Context manager for acquiring a file."""
--> 187         file, cached = self._acquire_with_cache_info(needs_lock)
    188         try:
    189             yield file

~/miniconda/envs/pangeo_forge_cm26/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    203                     kwargs = kwargs.copy()
    204                     kwargs["mode"] = self._mode
--> 205                 file = self._opener(*self._args, **kwargs)
    206                 if self._mode == "w":
    207                     # ensure file doesn't get overriden when opened again

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -68] NetCDF: I/O failure: b'https://datahub.geos.tamu.edu:8880/thredds/fileServer/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.SSH.032001-032912.nc'

We tried to download an example file with curl, which worked on my local machine from the terminal, which only succeeded from my local machine.

We did not get any further, but I would love to understand what went wrong here. I guess it is not technically necessary to look at the data first, but I do not know much about it, and so exploring it first seemed like a good idea.

rabernat commented 2 years ago

Is that an opendap link?

abishekg7 commented 2 years ago

Abishek from the (formerly) iHESP project here. Thanks for looking into it!

I think we've had some issues opening datasets via OpenDAP on our THREDDS server previously. If this is a pre-requisite in order to do the data ingestion to the cloud, we can try to prioritize this issue during this week or next. It's possible that our OpenDAP links worked via Ferret or ncdump, but not with the python netCDF libraries. I'll need to revisit this and confirm.

jbusecke commented 2 years ago

That would be great @abishekg7. Many thanks

rabernat commented 2 years ago

Actually we would prefer to get netCDF files, not opendap. Is there a direct netcdf file download link from the thredds server?

abishekg7 commented 2 years ago

I think a link like this would usually work with wget or curl.

Our thredds server was down last evening, but we have restarted it now. Since it's a little confusing, here are some basic instructions to download the netcdf files from a browser.

  1. Go to our thredds server link for a specific variable and click on any of the links

    image
  2. From the resulting page, click on the HTTPServer link to start the download. Or copy the link to use with wget.

    image

I can also give you wget script to make this a little easier.

rabernat commented 2 years ago

I was able to download the data no problem

 wget --no-check-certificate https://datahub.geos.tamu.edu:8880/thredds/fileServer/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.SSH.032001-032912.nc

and open it with xarray

xr.open_dataset('B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.SSH.032001-032912.nc'
<xarray.Dataset>
Dimensions:             (z_t: 62, z_t_150m: 15, z_w: 62, z_w_top: 62, z_w_bot: 62, nlat: 2400, nlon: 3600, time: 120, d2: 2)
Coordinates:
  * z_t                 (z_t) float32 500.0 1.5e+03 ... 5.625e+05 5.875e+05
  * z_t_150m            (z_t_150m) float32 500.0 1.5e+03 ... 1.35e+04 1.45e+04
  * z_w                 (z_w) float32 0.0 1e+03 2e+03 ... 5.5e+05 5.75e+05
  * z_w_top             (z_w_top) float32 0.0 1e+03 2e+03 ... 5.5e+05 5.75e+05
  * z_w_bot             (z_w_bot) float32 1e+03 2e+03 3e+03 ... 5.75e+05 6e+05
    ULONG               (nlat, nlon) float64 ...
    ULAT                (nlat, nlon) float64 ...
    TLONG               (nlat, nlon) float64 ...
    TLAT                (nlat, nlon) float64 ...
  * time                (time) object 0320-02-01 00:00:00 ... 0330-01-01 00:0...
Dimensions without coordinates: nlat, nlon, d2
Data variables: (12/51)
    dz                  (z_t) float32 1e+03 1e+03 1e+03 ... 2.5e+04 2.5e+04
    dzw                 (z_w) float32 500.0 1e+03 1e+03 ... 2.5e+04 2.5e+04
    KMT                 (nlat, nlon) float64 ...
    KMU                 (nlat, nlon) float64 ...
    REGION_MASK         (nlat, nlon) float64 ...
    UAREA               (nlat, nlon) float64 ...
    ...                  ...
    salinity_factor     float64 -0.00347
    sflux_factor        float64 0.1
    nsurface_t          float64 5.413e+06
    nsurface_u          float64 5.372e+06
    time_bound          (time, d2) object 0320-01-01 00:00:00 ... 0330-01-01 ...
    SSH                 (time, nlat, nlon) float32 ...
Attributes: (12/13)
    title:           B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02
    history:         Sat Jul 24 01:43:33 2021: ncap2 -A -s time=udunits(time,...
    Conventions:     CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf/CF-curr...
    contents:        Diagnostic and Prognostic Variables
    source:          CCSM POP2, the CCSM Ocean Component
    revision:        $Id: tavg.F90 56176 2013-12-20 18:35:46Z mlevy@ucar.edu $
    ...              ...
    start_time:      This dataset was created on 2019-11-06 at 09:58:40.0
    cell_methods:    cell_methods = time: mean ==> the variable values are av...
    nsteps_total:    68119571
    tavg_sum:        2678400.0
    tavg_sum_qflux:  2678400.0
    NCO:             netCDF Operators version 4.8.1 (Homepage = http://nco.sf/...

So it should be straightforward to ingest this data with a vanilla XarrayZarrRecipe.

jbusecke commented 2 years ago

Ok I put some work into this as a last action of the day.

First some thoughts about the whole dataset: Ideally we would like to have all variables in a gigantic zarr store I assume? I see a major challenge there to harmonize the filepatterns across different variables. Upon closer inspection, many of the different variables have completely different output periods: See the SSH variable:

image

these files are saved in 10 year intervals

For comparison the UVEL data:

image

These are saved yearly! So we need a separate file pattern for each variable! I might also be missing something here. Maybe @abishekg7 has more insight.

But additionally, there are some inconsistencies in the naming even within single variables. See the earlier files of SSH:

image

There is an irregularity (also seen in filesize) going on here (coinciding with another naming scheme).

So maybe this is not as 'vanilla' as assumed, but might be a good boundary pusher for pangeo-forge.

Either way I decided to start a recipe with an easier target: Start with SSH and only use the later years, where the save period is homogenous.

So this is what I came up with:

from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe
input_url_pattern = (
    "https://datahub.geos.tamu.edu:8880/thredds/fileServer/iHESPDataHUB"
    "/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH"
    "/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.nday1.SSH.{yyyystart}01-{yyyystop}12.nc"
)

def format_function(time):
    # I wonder if there is a way to name this simply year, and still have it concat along 'time'
    # I think that would be more intuitive here, but not a big deal.
    return input_url_pattern.format(
        yyyystart=f"{time:04d}",
        yyyystop=f"{time+9:04d}",
    )

# there is a break in the naming convention+ output timespan for some variables after year ~337.
# Lets only take years 340-510 for now
# years = range(340, 520, 10)
# for testing
years = range(340, 360, 10)

concat_dim = ConcatDim(name="time", keys=years, nitems_per_file=120)
pattern = FilePattern(format_function, concat_dim)

recipe = XarrayZarrRecipe(pattern, target_chunks={'time':6})
recipe

But when I try to test the caching I get a similar error to the one I got earlier:

for input_file in recipe.inputs_for_chunk(all_chunks[0]):
    recipe.cache_input(input_file)
pangeo_forge_recipes.recipes.xarray_zarr - INFO - Caching input 'Index({DimIndex(name='time', index=0, sequence_len=2, operation=<CombineOp.CONCAT: 2>)})'
pangeo_forge_recipes.storage - INFO - Caching file 'https://datahub.geos.tamu.edu:8880/thredds/fileServer/iHESPDataHUB/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.nday1.SSH.034001-034912.nc'
pangeo_forge_recipes.storage - INFO - Copying remote file 'https://datahub.geos.tamu.edu:8880/thredds/fileServer/iHESPDataHUB/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.nday1.SSH.034001-034912.nc' to cache
---------------------------------------------------------------------------
SSLCertVerificationError                  Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/connector.py:986, in TCPConnector._wrap_create_connection(self, req, timeout, client_error, *args, **kwargs)
    985     async with ceil_timeout(timeout.sock_connect):
--> 986         return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
    987 except cert_errors as exc:

File /srv/conda/envs/notebook/lib/python3.9/asyncio/base_events.py:1081, in BaseEventLoop.create_connection(self, protocol_factory, host, port, ssl, family, proto, flags, sock, local_addr, server_hostname, ssl_handshake_timeout, happy_eyeballs_delay, interleave)
   1078         raise ValueError(
   1079             f'A Stream Socket was expected, got {sock!r}')
-> 1081 transport, protocol = await self._create_connection_transport(
   1082     sock, protocol_factory, ssl, server_hostname,
   1083     ssl_handshake_timeout=ssl_handshake_timeout)
   1084 if self._debug:
   1085     # Get the socket from the transport because SSL transport closes
   1086     # the old socket and creates a new SSL socket

File /srv/conda/envs/notebook/lib/python3.9/asyncio/base_events.py:1111, in BaseEventLoop._create_connection_transport(self, sock, protocol_factory, ssl, server_hostname, server_side, ssl_handshake_timeout)
   1110 try:
-> 1111     await waiter
   1112 except:

File /srv/conda/envs/notebook/lib/python3.9/asyncio/sslproto.py:528, in SSLProtocol.data_received(self, data)
    527 try:
--> 528     ssldata, appdata = self._sslpipe.feed_ssldata(data)
    529 except (SystemExit, KeyboardInterrupt):

File /srv/conda/envs/notebook/lib/python3.9/asyncio/sslproto.py:188, in _SSLPipe.feed_ssldata(self, data, only_handshake)
    186 if self._state == _DO_HANDSHAKE:
    187     # Call do_handshake() until it doesn't raise anymore.
--> 188     self._sslobj.do_handshake()
    189     self._state = _WRAPPED

File /srv/conda/envs/notebook/lib/python3.9/ssl.py:944, in SSLObject.do_handshake(self)
    943 """Start the SSL/TLS handshake."""
--> 944 self._sslobj.do_handshake()

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)

The above exception was the direct cause of the following exception:

ClientConnectorCertificateError           Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py:383, in HTTPFileSystem._info(self, url, **kwargs)
    381 try:
    382     info.update(
--> 383         await _file_info(
    384             url,
    385             size_policy=policy,
    386             session=session,
    387             **self.kwargs,
    388             **kwargs,
    389         )
    390     )
    391     if info.get("size") is not None:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py:732, in _file_info(url, session, size_policy, **kwargs)
    731 elif size_policy == "get":
--> 732     r = await session.get(url, allow_redirects=ar, **kwargs)
    733 else:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py:535, in ClientSession._request(self, method, str_or_url, params, data, json, cookies, headers, skip_auto_headers, auth, allow_redirects, max_redirects, compress, chunked, expect100, raise_for_status, read_until_eof, proxy, proxy_auth, timeout, verify_ssl, fingerprint, ssl_context, ssl, proxy_headers, trace_request_ctx, read_bufsize)
    534         assert self._connector is not None
--> 535         conn = await self._connector.connect(
    536             req, traces=traces, timeout=real_timeout
    537         )
    538 except asyncio.TimeoutError as exc:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/connector.py:542, in BaseConnector.connect(self, req, traces, timeout)
    541 try:
--> 542     proto = await self._create_connection(req, traces, timeout)
    543     if self._closed:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/connector.py:907, in TCPConnector._create_connection(self, req, traces, timeout)
    906 else:
--> 907     _, proto = await self._create_direct_connection(req, traces, timeout)
    909 return proto

File /srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/connector.py:1206, in TCPConnector._create_direct_connection(self, req, traces, timeout, client_error)
   1205 assert last_exc is not None
-> 1206 raise last_exc

File /srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/connector.py:1175, in TCPConnector._create_direct_connection(self, req, traces, timeout, client_error)
   1174 try:
-> 1175     transp, proto = await self._wrap_create_connection(
   1176         self._factory,
   1177         host,
   1178         port,
   1179         timeout=timeout,
   1180         ssl=sslcontext,
   1181         family=hinfo["family"],
   1182         proto=hinfo["proto"],
   1183         flags=hinfo["flags"],
   1184         server_hostname=hinfo["hostname"] if sslcontext else None,
   1185         local_addr=self._local_addr,
   1186         req=req,
   1187         client_error=client_error,
   1188     )
   1189 except ClientConnectorError as exc:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/connector.py:988, in TCPConnector._wrap_create_connection(self, req, timeout, client_error, *args, **kwargs)
    987 except cert_errors as exc:
--> 988     raise ClientConnectorCertificateError(req.connection_key, exc) from exc
    989 except ssl_errors as exc:

ClientConnectorCertificateError: Cannot connect to host datahub.geos.tamu.edu:8880 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')]

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
Input In [56], in <cell line: 1>()
      1 for input_file in recipe.inputs_for_chunk(all_chunks[0]):
----> 2     recipe.cache_input(input_file)

File ~/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:924, in XarrayZarrRecipe.cache_input(self, input_key)
    919 """Cache an input
    920 
    921 :param chunk_key: Which input to cache
    922 """
    923 warnings.warn(_deprecation_message, DeprecationWarning)
--> 924 cache_input(input_key, config=self)

File ~/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:155, in cache_input(input_key, config)
    153     logger.info(f"Caching input '{input_key!s}'")
    154     fname = config.file_pattern[input_key]
--> 155     config.storage_config.cache.cache_file(
    156         fname,
    157         config.file_pattern.query_string_secrets,
    158         **config.file_pattern.fsspec_open_kwargs,
    159     )
    161 if config.cache_metadata:
    162     if config.storage_config.metadata is None:

File ~/pangeo-forge-recipes/pangeo_forge_recipes/storage.py:166, in CacheFSSpecTarget.cache_file(self, fname, secrets, **open_kwargs)
    164 target_opener = self.open(fname, mode="wb")
    165 logger.info(f"Copying remote file '{fname}' to cache")
--> 166 _copy_btw_filesystems(input_opener, target_opener)

File ~/pangeo-forge-recipes/pangeo_forge_recipes/storage.py:37, in _copy_btw_filesystems(input_opener, output_opener, BLOCK_SIZE)
     36 def _copy_btw_filesystems(input_opener, output_opener, BLOCK_SIZE=10_000_000):
---> 37     with input_opener as source:
     38         with output_opener as target:
     39             start = time.time()

File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/core.py:103, in OpenFile.__enter__(self)
    100 def __enter__(self):
    101     mode = self.mode.replace("t", "").replace("b", "") + "b"
--> 103     f = self.fs.open(self.path, mode=mode)
    105     self.fobjects = [f]
    107     if self.compression is not None:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py:1030, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
   1028 else:
   1029     ac = kwargs.pop("autocommit", not self._intrans)
-> 1030     f = self._open(
   1031         path,
   1032         mode=mode,
   1033         block_size=block_size,
   1034         autocommit=ac,
   1035         cache_options=cache_options,
   1036         **kwargs,
   1037     )
   1038     if compression is not None:
   1039         from fsspec.compression import compr

File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py:343, in HTTPFileSystem._open(self, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs)
    341 kw["asynchronous"] = self.asynchronous
    342 kw.update(kwargs)
--> 343 size = size or self.info(path, **kwargs)["size"]
    344 session = sync(self.loop, self.set_session)
    345 if block_size and size:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py:91, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
     88 @functools.wraps(func)
     89 def wrapper(*args, **kwargs):
     90     self = obj or args[0]
---> 91     return sync(self.loop, func, *args, **kwargs)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py:71, in sync(loop, func, timeout, *args, **kwargs)
     69     raise FSTimeoutError from return_result
     70 elif isinstance(return_result, BaseException):
---> 71     raise return_result
     72 else:
     73     return return_result

File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py:25, in _runner(event, coro, result, timeout)
     23     coro = asyncio.wait_for(coro, timeout=timeout)
     24 try:
---> 25     result[0] = await coro
     26 except Exception as ex:
     27     result[0] = ex

File /srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py:396, in HTTPFileSystem._info(self, url, **kwargs)
    393     except Exception as exc:
    394         if policy == "get":
    395             # If get failed, then raise a FileNotFoundError
--> 396             raise FileNotFoundError(url) from exc
    397         logger.debug(str(exc))
    399 return {"name": url, "size": None, **info, "type": "file"}

FileNotFoundError: https://datahub.geos.tamu.edu:8880/thredds/fileServer/iHESPDataHUB/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.nday1.SSH.034001-034912.nc

Note that I can download the file manually using:

!wget --no-check-certificate https://datahub.geos.tamu.edu:8880/thredds/fileServer/iHESPDataHUB/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02/ocn/SSH/B.E.13.B1850C5.ne120_t12.sehires38.003.sunway_02.pop.h.nday1.SSH.034001-034912.nc

The key there is the --no-check-certificate flag. Is there a way to pass something similar to the recipe?

Happy to keep tinkering further but for today Ill clock out and watch some 🏀