whole-tale / terraform_deployment

Terraform deployment setup for WT prod
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Problems identified testing v0.3 #25

Open craig-willis opened 6 years ago

craig-willis commented 6 years ago
craig-willis commented 6 years ago

Timeout error was due to incorrect queue specified during manual restart of celery_worker. I used worker but needed to be celery. Also, actual error message was in internal log file in girder container (/root/.girder/logs).

Traceback (most recent call last):
  File "/girder/girder/api/rest.py", line 620, in endpointDecorator
    val = fun(self, args, kwargs)
  File "/girder/girder/api/rest.py", line 1204, in POST
    return self.handleRoute(method, path, params)
  File "/girder/girder/api/rest.py", line 947, in handleRoute
    val = handler(**kwargs)
  File "/girder/girder/api/access.py", line 63, in wrapped
    return fun(*args, **kwargs)
  File "/girder/girder/api/describe.py", line 702, in wrapped
    return fun(*args, **kwargs)
  File "/girder/plugins/wholetale/server/rest/instance.py", line 166, in createInstance
    save=True)
  File "/girder/plugins/wholetale/server/models/instance.py", line 147, in createInstance
    volume = volumeTask.get(timeout=TASK_TIMEOUT)
  File "/usr/local/lib/python3.5/dist-packages/celery/result.py", line 191, in get
    on_message=on_message,
  File "/usr/local/lib/python3.5/dist-packages/celery/backends/async.py", line 188, in wait_for_pending
    for _ in self._wait_for_pending(result, **kwargs):
  File "/usr/local/lib/python3.5/dist-packages/celery/backends/async.py", line 259, in _wait_for_pending
    raise TimeoutError('The operation timed out.')
celery.exceptions.TimeoutError: The operation timed out.

After restarting the workers with correct queue, things are working.

craig-willis commented 6 years ago

Separate issue now after launching the tale:

 docker service ps tmp-k8evq397tylo --no-trunc
ID                          NAME                     IMAGE                                                   NODE                DESIRED STATE       CURRENT STATE                 ERROR                                                                           PORTS
gx9lt9w7jvch838qiw7hsedbv   tmp-k8evq397tylo.1       registry.stage.wholetale.org/5964d96e1801c10001061e49   wt-stage-01         Ready               Rejected 2 seconds ago        "No such image: registry.stage.wholetale.org/5964d96e1801c10001061e49:latest"

Do we need to migrate the registry?

Xarthisius commented 6 years ago

Migrate or trigger build for all images.

Xarthisius commented 6 years ago

...or make plugin do it if the image is not there, although that will significantly increase deployment time to staging each time