xcube-dev / xcube

xcube is a Python package for generating and exploiting data cubes powered by xarray, dask, and zarr.
https://xcube.readthedocs.io/
MIT License
194 stars 17 forks source link

Failing data acess via xcube serve with python api #574

Open AliceBalfanz opened 2 years ago

AliceBalfanz commented 2 years ago

Describe the bug It is not possible to load the data published via xcube serve anymore:

ds = open_cube('s3bucket/local',
                       format_name='zarr',
                       s3_client_kwargs=dict(endpoint_url=SERVER_URL))

The Unittests are also failing.

To Reproduce Steps to reproduce the behavior:

  1. update your xcube
  2. Start local xcube server, which is used for xcube unit tests
  3. Run unit tests for test_s3buckethandlers.py
  4. See error

Expected behavior Access was possible with prior versions.

Additional context When testing with the dcs4cop demo server, a key error appears:

ds = open_cube(
    's3bucket/bc_olci_ns_tirr_v1', 
    format_name='zarr', 
    s3_kwargs={
        'anon': True
    },
    s3_client_kwargs={
        'endpoint_url': 'http://service.demo.dcs4cop.eu/xcube/api/latest'
    }
)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_3788/1790874407.py in <module>
----> 1 ds_t = open_cube(
      2     's3bucket/bc_olci_ns_tirr_v1',
      3     format_name='zarr',
      4     s3_kwargs={
      5         'anon': True

~/projects/xcube/xcube/core/dsio.py in open_cube(input_path, format_name, **kwargs)
     51     :return: xcube dataset
     52     """
---> 53     return open_dataset(input_path, format_name=format_name, is_cube=True, **kwargs)
     54 
     55 

~/projects/xcube/xcube/core/dsio.py in open_dataset(input_path, format_name, is_cube, **kwargs)
     93     if dataset_io is None:
     94         raise ValueError(f"Unknown input format {format_name!r} for {input_path}")
---> 95     dataset = dataset_io.read(input_path, **kwargs)
     96     if is_cube:
     97         assert_cube(dataset)

~/projects/xcube/xcube/core/dsio.py in read(self, path, s3_kwargs, s3_client_kwargs, max_cache_size, **kwargs)
    427         consolidated = False
    428         if isinstance(path, str):
--> 429             path_or_store, consolidated = get_path_or_s3_store(path_or_store,
    430                                                                s3_kwargs=s3_kwargs,
    431                                                                s3_client_kwargs=s3_client_kwargs,

~/projects/xcube/xcube/core/dsio.py in get_path_or_s3_store(path_or_url, s3_kwargs, s3_client_kwargs, mode)
    573             or s3_kwargs is not None \
    574             or s3_client_kwargs is not None:
--> 575         s3, root = parse_s3_fs_and_root(path_or_url,
    576                                         s3_kwargs=s3_kwargs,
    577                                         s3_client_kwargs=s3_client_kwargs,

~/projects/xcube/xcube/core/dsio.py in parse_s3_fs_and_root(s3_url, s3_kwargs, s3_client_kwargs, mode)
    611         s3_client_kwargs=s3_client_kwargs
    612     )
--> 613     s3 = new_s3_file_system(s3_kwargs=s3_kwargs,
    614                             s3_client_kwargs=s3_client_kwargs,
    615                             check_path=root if mode == 'r' else None)

~/projects/xcube/xcube/core/dsio.py in new_s3_file_system(s3_kwargs, s3_client_kwargs, s3_config_param_name, check_path)
    647         if check_path is not None:
    648             # Force potential NoCredentialsError
--> 649             s3.exists(check_path)
    650         return s3
    651     except botocore.exceptions.NoCredentialsError:

~/miniconda3/envs/xcube/lib/python3.9/site-packages/fsspec/asyn.py in wrapper(*args, **kwargs)
     89     def wrapper(*args, **kwargs):
     90         self = obj or args[0]
---> 91         return sync(self.loop, func, *args, **kwargs)
     92 
     93     return wrapper

~/miniconda3/envs/xcube/lib/python3.9/site-packages/fsspec/asyn.py in sync(loop, func, timeout, *args, **kwargs)
     69         raise FSTimeoutError from return_result
     70     elif isinstance(return_result, BaseException):
---> 71         raise return_result
     72     else:
     73         return return_result

~/miniconda3/envs/xcube/lib/python3.9/site-packages/fsspec/asyn.py in _runner(event, coro, result, timeout)
     23         coro = asyncio.wait_for(coro, timeout=timeout)
     24     try:
---> 25         result[0] = await coro
     26     except Exception as ex:
     27         result[0] = ex

~/miniconda3/envs/xcube/lib/python3.9/site-packages/s3fs/core.py in _exists(self, path)
    820                 return False
    821             try:
--> 822                 await self._info(path, bucket, key, version_id=version_id)
    823                 return True
    824             except FileNotFoundError:

~/miniconda3/envs/xcube/lib/python3.9/site-packages/s3fs/core.py in _info(self, path, bucket, key, refresh, version_id)
   1026                     "Key": "/".join([bucket, key]),
   1027                     "LastModified": out["LastModified"],
-> 1028                     "Size": out["ContentLength"],
   1029                     "size": out["ContentLength"],
   1030                     "name": "/".join([bucket, key]),

KeyError: 'ContentLength'
AliceBalfanz commented 2 years ago

FYI: The xcube server needs to be started with the same xcube version as the version used for the python api access. Therefore the example with wont be working as long as the dcs4cop server is not updated.

ds = open_cube(
    's3bucket/bc_olci_ns_tirr_v1', 
    format_name='zarr', 
    s3_kwargs={
        'anon': True
    },
    s3_client_kwargs={
        'endpoint_url': 'http://service.demo.dcs4cop.eu/xcube/api/latest'
    }
)

PR https://github.com/dcs4cop/xcube/pull/592 fixes existing test, but when serving data using DataStore schema the following problem appears:

tornado.application - ERROR - Uncaught exception HEAD /s3bucket/bigfe~HH_CityCube_RGB.zarr (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:8080', method='HEAD', uri='/s3bucket/bigfe~HH_CityCube_RGB.zarr', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/home/alicja/miniconda3/envs/xcube-dev/lib/python3.9/site-packages/tornado/web.py", line 1704, in _execute
    result = await result
  File "/home/alicja/Desktop/projects/xcube/xcube/webapi/handlers.py", line 223, in head
    key, local_path = self._get_key_and_local_path(ds_id, path)
  File "/home/alicja/Desktop/projects/xcube/xcube/webapi/handlers.py", line 285, in _get_key_and_local_path
    if os.path.isabs(local_path):
  File "/home/alicja/miniconda3/envs/xcube-dev/lib/python3.9/posixpath.py", line 62, in isabs
    s = os.fspath(s)
TypeError: expected str, bytes or os.PathLike object, not NoneType