xcube-dev / xcube

xcube is a Python package for generating and exploiting data cubes powered by xarray, dask, and zarr.
https://xcube.readthedocs.io/
MIT License
201 stars 20 forks source link

Intermittent MaxRetryError (and others) in CI unit test runs #899

Open pont-us opened 1 year ago

pont-us commented 1 year ago

Describe the bug

Recently, CI unit test suite jobs have been producing increasingly frequent intermittent test failures due to time-outs and excessive retries. The problem seems to occur most frequently on GitHub, but occasionally also on AppVeyor. This issue was prompted by this GitHub actions run which produced the following error:

FAILED test/webapi/ows/stac/test_routes.py::StacRoutesTest::test_fetch_catalog_collection_single_items - 
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=34113): Max retries exceeded with url: 
/ogc/collections/demo/items (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8a94db71d0>: 
Failed to establish a new connection: [Errno 111] Connection refused'))

We can use this issue to collect further instances of the problem.

So far our fix for such errors has been "re-run the job and hope it goes away", which it generally does, but this is turning into something of a time sink.

To Reproduce Steps to reproduce the behavior: keep re-running the GitHub unittest job until an error occurs.

Expected behavior All tests should pass reliably on every CI run.

Additional context With luck, this might be fairly easily fixable by tweaking some back-off / time-out / retry parameters in ServerTestCase or similar.

pont-us commented 1 year ago

Another one, from https://github.com/dcs4cop/xcube/actions/runs/6467629870/job/17559781773:

FAILED test/webapi/places/test_routes.py::PlacesRoutesTest::test_places - urllib3.exceptions.MaxRetryError: 
HTTPConnectionPool(host='localhost', port=33397): Max retries exceeded with url: /places (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6942170a10>: Failed to establish a new connection: 
[Errno 111] Connection refused'))
pont-us commented 1 year ago

And another from https://github.com/dcs4cop/xcube/actions/runs/6467629870/job/17562750121:

FAILED test/webapi/ows/stac/test_routes.py::StacRoutesTest::test_fetch_catalog_collections - urllib3.exceptions.MaxRetryError: 
HTTPConnectionPool(host='localhost', port=53383): Max retries exceeded with url: /ogc/collections (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f09cc4ae090>: Failed to establish a new connection: 
[Errno 111] Connection refused'))
pont-us commented 1 year ago

https://github.com/dcs4cop/xcube/actions/runs/6467629870/job/17571143347

FAILED test/webapi/datasets/test_routes.py::DatasetsRoutesTest::test_fetch_datasets - urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=39147): Max retries exceeded with url: /datasets (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb970811990>: Failed to establish a new connection: [Errno 111] Connection refused'))
FAILED test/webapi/s3/test_routes.py::S3RoutesNewTest::test_fetch_get_s3_object - urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=54325): Max retries exceeded with url: /s3/datasets/demo.zarr/.zattrs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb98c58f950>: Failed to establish a new connection: [Errno 111] Connection refused'))
pont-us commented 11 months ago

Not sure if this one is related, but this test run on AppVeyor produced

ERROR test/core/zarrstore/test_generic.py::CommonS3ZarrStoreTest::test_it - TimeoutError: timed out
thomasstorm commented 10 months ago

Reopened as per @TonioF's comment. The PR addresses the current problems, but probably new related issues will appear, therefore the issue stays open.

https://github.com/dcs4cop/xcube/pull/918#pullrequestreview-1829460478