os-climate / hazard

Onboarding, creation and transformation of climate hazard models for OS-Climate
Apache License 2.0
8 stars 12 forks source link

UKCP18 download from CEDA - FTP timeout errors #114

Open j08lue opened 3 months ago

j08lue commented 3 months ago

When running the latest hazard workflow with UKCP18 data download from CEDA, I sometimes get timeout errors like the one below:

[2024-08-20 09:49:32,304] {days_tas_above.py:60} INFO - Starting calculation for year 2038 
[2024-08-20 09:49:32,331] {indicator_model.py:37} ERROR - Batch item failed
Traceback (most recent call last):
  File "/app/src/hazard/indicator_model.py", line 35, in run_all
    self.run_single(item, source, target, client)
  File "/app/src/hazard/models/multi_year_average.py", line 85, in run_single
    averaged_indicators = self._averaged_indicators(client, source, target, item)
  File "/app/src/hazard/models/multi_year_average.py", line 131, in _averaged_indicators
    client.gather(futures)
  File "/usr/local/lib/python3.10/site-packages/distributed/client.py", line 2361, in gather
    return self.sync(
  File "/usr/local/lib/python3.10/site-packages/distributed/utils.py", line 351, in sync
    return sync(
  File "/usr/local/lib/python3.10/site-packages/distributed/utils.py", line 418, in sync
    raise exc.with_traceback(tb)
  File "/usr/local/lib/python3.10/site-packages/distributed/utils.py", line 391, in f
    result = yield future
  File "/usr/local/lib/python3.10/site-packages/tornado/gen.py", line 766, in run
    value = future.result()
  File "/usr/local/lib/python3.10/site-packages/distributed/client.py", line 2224, in _gather
    raise exception.with_traceback(traceback) 
  File "/app/src/hazard/models/days_tas_above.py", line 62, in _calculate_single_year_indicators
    tas = stack.enter_context(
  File "/usr/local/lib/python3.10/contextlib.py", line 492, in enter_context
    result = _cm_type.__enter__(cm)
  File "/usr/local/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/app/src/hazard/sources/ukcp18.py", line 63, in open_dataset_year
    all_data_from_files = self._combine_all_files_data(files_available_for_quantity)
  File "/app/src/hazard/sources/ukcp18.py", line 76, in _combine_all_files_data
    with io.BytesIO(f.read()) as file_in_memory:
  File "/usr/local/lib/python3.10/site-packages/fsspec/spec.py", line 1846, in read
    out = self.cache._fetch(self.loc, self.loc + length)
  File "/usr/local/lib/python3.10/site-packages/fsspec/caching.py", line 189, in _fetch
    self.cache = self.fetcher(start, end)  # new block replaces old
  File "/usr/local/lib/python3.10/site-packages/fsspec/implementations/ftp.py", line 326, in _fetch_range
    self.fs.ftp.retrbinary(
  File "/usr/local/lib/python3.10/ftplib.py", line 438, in retrbinary
    data = conn.recv(blocksize)
TimeoutError: timed out

A little later, download of all files completes fine again. šŸ¤·

Not sure there are some rate limits or so? Could in principle also be an issue with my local connection, I guess.

ciaransweet commented 3 months ago

FWIW I think this is on me/DevSeed to address.

I've not experienced it yet so we'll keep an eye on it.