teamhephy / controller

Hephy Workflow Controller (API)
https://teamhephy.com
MIT License
14 stars 26 forks source link

Only able to create 200 releases? #17

Closed Cryptophobia closed 6 years ago

Cryptophobia commented 6 years ago

From @nathansamson on July 28, 2017 15:2

Some of our apps stopped deploying after a while (see controller logs).

In at least 2 of the 3 cases, the failures started exactly at release v200 (v200 works, v201 didn't)

We are using workflow 2.11 with kubernetes 1.5.2

ERROR [bplpp-stef]: (app::deploy): 'NoneType' object is not subscriptable
ERROR:root:(app::deploy): 'NoneType' object is not subscriptable
Traceback (most recent call last):
  File "/app/api/models/app.py", line 526, in deploy
    async_run(tasks)
  File "/app/api/utils.py", line 169, in async_run
    raise error
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/app/api/utils.py", line 181, in async_task
    await loop.run_in_executor(None, params)
  File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/scheduler/__init__.py", line 270, in deploy
    namespace, name, image, entrypoint, command, **kwargs
  File "/app/scheduler/resources/deployment.py", line 138, in update
    self.wait_until_ready(namespace, name, **kwargs)
  File "/app/scheduler/resources/deployment.py", line 336, in wait_until_ready
    self._check_for_failed_events(namespace, labels=labels)
  File "/app/scheduler/resources/deployment.py", line 373, in _check_for_failed_events
    'involvedObject.name': data['items'][0]['metadata']['name'],
TypeError: 'NoneType' object is not subscriptable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 65, in create
    self.app.deploy(new_release)
  File "/app/api/models/app.py", line 545, in deploy
    raise ServiceUnavailable(err) from e
api.exceptions.ServiceUnavailable: (app::deploy): 'NoneType' object is not subscriptable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 480, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 527, in create
    super(BuildHookViewSet, self).create(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 533, in post_save
    build.create(self.user)
  File "/app/api/models/build.py", line 79, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: (app::deploy): 'NoneType' object is not subscriptable

Copied from original issue: deis/controller#1315

nathansamson commented 6 years ago

Although I am not sure (the cluster has been deleted by now) I think later on we were able to have more than 200 releases per app.

Although the issues occured over 2 or 3 days (back in the time) it might have been coincidence...

Cryptophobia commented 6 years ago

Ok, looks like this is not really an issue because we have a cluster with applications that have over 200+ releases.

I will go ahead and close this for now and reopen if it occurs again.