jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.54k stars 388 forks source link

ci: extend test timeout #1549

Closed minrk closed 1 year ago

minrk commented 1 year ago

tests are getting cancelled because 10 minutes is not enough.

time limits added in #1518

betatim commented 1 year ago

Tests got cancelled after 15min :-/

From the logs of the helm tests:

============================= test session starts ==============================
platform linux -- Python 3.9.14, pytest-7.1.3, pluggy-1.0.0 -- /opt/hostedtoolcache/Python/3.9.14/x64/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/binderhub/binderhub
plugins: cov-4.0.0, asyncio-0.19.0
asyncio: mode=strict
collecting ... collected 129 items / 118 deselected / 11 selected

and then it got cancelled. This seems weird no? I was expecting it to run some test or do something, not spend 12minutes finding tests to run.

consideRatio commented 1 year ago

No, they are stuck.

User pods are stuck pending, because user scheduler isnt supporting k8s 1.25.

See #1544

betatim commented 1 year ago

Ah ok. I had expected to see some output about a particular test having started (and then getting stuck).

Looking at the PR you linked, I think we don't need to extend the timeout for the tests. Instead we should migrate BinderHub to be compatible with JupyterHub 2.

manics commented 1 year ago

Ah ok. I had expected to see some output about a particular test having started (and then getting stuck).

Yes, that's why I originally added the timeout. I noticed that coding errors during development could lead to tests hanging due to an unexpected server response.... which meant they got stuck for the default timeout of 6(?) hours.

Is there a way to set a default timeout for each pytest test (e.g. maybe 1 or 2 minutes?), since that would at least give us a bit more information on how extensive the failures are.

As a short term fix, will pinning the CI tests to K8s 1.24 solve the failures?

Edit: Also worth noting for anyone not aware, the Kubernetes namespace report section of the CI logs will show the state of the system, so you can check if BinderHub is running (or not). This report is included for failures: https://github.com/jupyterhub/binderhub/blob/969f2f63041b5175681064e938dd58c49f9fea63/.github/workflows/test.yml#L271-L275

minrk commented 1 year ago

As a short term fix, will pinning the CI tests to K8s 1.24 solve the failures?

Yes, I think that's the right change. k8s version should be pinned and upgraded explicitly in CI anyway.

minrk commented 1 year ago

Is there a way to set a default timeout for each pytest test (e.g. maybe 1 or 2 minutes?)

Yeah, pytest-timeout should work.

minrk commented 1 year ago

1550 pins k3s to 1.24 to get tests working again (sorry @consideRatio for not seeing #1541 first). #1551 adds per-test timeouts so we shouldn't hang the whole test suite anymore when similar bugs crop up.