dependency failed to start: container for service "web" is unhealthy

clone3448 commented 1 year ago

Good day, I tried to deploy the production docker compose image on the container manager on my synology ds923+ but got the error: dependency failed to start: container for service "web" is unhealthy. I have altered the compose provided on github a bit to this (mainly the volumes): `services: web: image: wger/server:latest container_name: wger_server depends_on: db: condition: service_healthy cache: condition: service_healthy env_file:

/volume1/docker/wger/config/prod.env volumes:
static:/home/wger/static
media:/home/wger/media expose:
8000 healthcheck: test: wget --no-verbose --tries=1 --spider http://localhost:8000 interval: 10s timeout: 5s retries: 5 restart: unless-stopped

nginx: image: nginx:stable container_name: wger_nginx depends_on:
web volumes:
/volume1/docker/wger/config/nginx.conf:/etc/nginx/conf.d/default.conf
static:/wger/static:ro
media:/wger/media:ro ports:
"8001:80" healthcheck: test: service nginx status interval: 10s timeout: 5s retries: 5 restart: unless-stopped

db: image: postgres:15-alpine container_name: wger_db environment:
POSTGRES_USER=wger
POSTGRES_PASSWORD=wger
POSTGRES_DB=wger volumes:
postgres-data:/var/lib/postgresql/data/ expose:
5432 healthcheck: test: pg_isready -U wger interval: 10s timeout: 5s retries: 5 restart: unless-stopped

cache: image: redis container_name: wger_cache expose:
6379 volumes:
redis-data:/data healthcheck: test: redis-cli ping interval: 10s timeout: 5s retries: 5 restart: unless-stopped

celery_worker: image: wger/server:latest container_name: wger_celery_worker command: /start-worker env_file:
/volume1/docker/wger/config/prod.env volumes:
media:/home/wger/media depends_on: web: condition: service_healthy healthcheck: test: celery -A wger inspect ping interval: 10s timeout: 5s retries: 5

celery_beat: image: wger/server:latest container_name: wger_celery_beat command: /start-beat volumes:
celery-beat:/home/wger/beat/ env_file:
/volume1/docker/wger/config/prod.env depends_on: celery_worker: condition: service_healthy

volumes: postgres-data: celery-beat: static: media: redis-data:

networks: default: name: wger_network` Furthermore the nginx.conf is not altered, and the prod.env is only altered with the SECRET_KEY and SIGNING_KEY.

I can access the website, which looks like this:

The wger_server docker container seems to be not working correctly, looking in the log I see the following: After thousands of items being deleted, I get this:

The other docker containers seem to not have a lot of issues in the log, except wger_celery_worker

The console terminal of the entire stack looks like this:

What is going wrong in my configurations and how can I deal with it? First time I am using databases in a docker compose file.

bbkz commented 1 year ago

I don't know the docker-compose setup. But starting up the wger container takes a long time especialy on lower end hardware. As i'm running it on raspberry pi's and similar i had to do some tweaks.

For gunicorn not to run into a timeout, you may need to add the following environment variable:

GUNICORN_CMD_ARGS="--timeout 240 --workers=2"

A other idea would be to also disable the healthchecks , i don't know on docker compose but kubernetes will otherwise kill the container and start it again (loop).

rolandgeider commented 1 year ago

Hi! Do you get some error in the logs when opening the application? (in the web service) I just started a new instance with the default compose and conf file and everything booted up nicely:

NAME                 IMAGE                COMMAND                  SERVICE         CREATED              STATUS                        PORTS
wger_cache           redis                "docker-entrypoint.s…"   cache           About a minute ago   Up About a minute (healthy)   0.0.0.0:6379->6379/tcp
wger_celery_beat     wger/server:latest   "/start-beat"            celery_beat     About a minute ago   Up About a minute             8000/tcp
wger_celery_flower   wger/server:latest   "/start-flower"          celery_flower   About a minute ago   Up About a minute (healthy)   0.0.0.0:5555->5555/tcp, 8000/tcp
wger_celery_worker   wger/server:latest   "/start-worker"          celery_worker   About a minute ago   Up About a minute (healthy)   8000/tcp
wger_db              postgres:15-alpine   "docker-entrypoint.s…"   db              About a minute ago   Up About a minute (healthy)   0.0.0.0:5432->5432/tcp
wger_nginx           nginx:stable         "/docker-entrypoint.…"   nginx           About a minute ago   Up About a minute (healthy)   0.0.0.0:80->80/tcp, 0.0.0.0:8080->80/tcp
wger_server          wger/server:latest   "/home/wger/entrypoi…"   web             About a minute ago   Up About a minute (healthy)   8000/tcp

Somebody else had the problem that the application tried to setup the database before it was ready so some things were missing. What helped them was to drop the db volume, start the db service manually first and then all the rest (this only this first initial run, later it's not important)

clone3448 commented 1 year ago

First of all, thank you for responding. @rolandgeider When I open the application I do not see new logs after the following logs when I rebuilded the stack (no change):

When I deleted the volume entry at the db service in the compose, and start the db service manually I have the same issue. Do you think I should disable the healthchecks under wger_service as proposed by bbkz? Because when I did, still have the same issue. However, then I was thinking about celery_worker and celery_beat, they do not activate due to this healthcheck dependency.

@bbkz I don't think it is a problem based on lower end hardware. However I tried to add that env entry GUNICORN_CMD_ARGS="--timeout 240 --workers=2" in the prod.env file. But no difference in the result.

clone3448 commented 1 year ago

When I removed the healthcheck dependency for the celery_worker, I produced a log for that container, maybe this might help troubleshooting:

But okay, when I restored back to the first compose file. I altered the prod.env for the debugging mode DJANGO_DEBUG=True The webpage now shows the following:

`Environment:

Request Method: GET Request URL: http://workout.XXXXXXX.com/en/software/terms-of-service

Django Version: 4.1.9 Python Version: 3.10.6 Installed Applications: ('django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.messages', 'django.contrib.sessions', 'django.contrib.sites', 'django.contrib.staticfiles', 'django_extensions', 'storages', 'wger.config', 'wger.core', 'wger.mailer', 'wger.exercises', 'wger.gym', 'wger.manager', 'wger.nutrition', 'wger.software', 'wger.utils', 'wger.weight', 'wger.gallery', 'wger.measurements', 'captcha', 'django.contrib.sitemaps', 'easy_thumbnails', 'compressor', 'crispy_forms', 'crispy_bootstrap5', 'rest_framework', 'rest_framework.authtoken', 'django_filters', 'rest_framework_simplejwt', 'drf_spectacular', 'drf_spectacular_sidecar', 'django_bootstrap_breadcrumbs', 'corsheaders', 'axes', 'simple_history', 'django_email_verification', 'actstream', 'fontawesomefree') Installed Middleware: ('corsheaders.middleware.CorsMiddleware', 'django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'wger.utils.middleware.JavascriptAJAXRedirectionMiddleware', 'wger.utils.middleware.WgerAuthenticationMiddleware', 'wger.utils.middleware.RobotsExclusionMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', 'django.middleware.locale.LocaleMiddleware', 'simple_history.middleware.HistoryRequestMiddleware', 'axes.middleware.AxesMiddleware')

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/django/core/handlers/exception.py", line 56, in inner response = get_response(request) File "/usr/local/lib/python3.10/dist-packages/django/core/handlers/base.py", line 220, in _get_response response = response.render() File "/usr/local/lib/python3.10/dist-packages/django/template/response.py", line 114, in render self.content = self.rendered_content File "/usr/local/lib/python3.10/dist-packages/django/template/response.py", line 92, in rendered_content return template.render(context, self._request) File "/usr/local/lib/python3.10/dist-packages/django/template/backends/django.py", line 61, in render return self.template.render(context) File "/usr/local/lib/python3.10/dist-packages/django/template/base.py", line 173, in render with context.bind_template(self): File "/usr/lib/python3.10/contextlib.py", line 135, in enter return next(self.gen) File "/usr/local/lib/python3.10/dist-packages/django/template/context.py", line 254, in bind_template updates.update(processor(self.request)) File "/home/wger/src/wger/utils/context_processor.py", line 85, in processor get_custom_header(request), File "/home/wger/src/wger/utils/context_processor.py", line 126, in get_custom_header global_gymconfig = GymConfig.objects.get(pk=1) File "/usr/local/lib/python3.10/dist-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/django/db/models/query.py", line 650, in get raise self.model.DoesNotExist(

Exception Type: DoesNotExist at /en/software/terms-of-service Exception Value: GymConfig matching query does not exist.` The logs of wger_server container talks about an internal server error:

rolandgeider commented 1 year ago

yes the gymconfig stuff, that definitely means that the database wasn't initialised properly.

sorry, I didn't mean that you remove the volume from the compose file, just to delete the volume itself and start the service manually, so like this

docker compose down
docker volume rm docker_postgres-data
docker compose up db -d # wait some seconds
docker compose up

(also you should get a medal for all the logs you provide!)

clone3448 commented 1 year ago

thank you! I expected providing as much as possible might be the best to troubleshoot :) When you talked about not deleting the volume from the compose and just deleing the volume itself, I was then looking where the files were actually stored; they were not stored anywhere. So I changed the volume paths again to correct folders that I created now, because the folders did not exist at first. I changed the compose file to the following: `services: web: image: wger/server:latest container_name: wger_server depends_on: db: condition: service_healthy cache: condition: service_healthy env_file:

/volume1/docker/wger/config/prod.env volumes:
static:/home/wger/static
media:/home/wger/media expose:
8000 healthcheck: test: wget --no-verbose --tries=1 --spider http://localhost:8000 interval: 10s timeout: 5s retries: 5 restart: unless-stopped

nginx: image: nginx:stable container_name: wger_nginx depends_on:
web volumes:
/volume1/docker/wger/config/nginx.conf:/etc/nginx/conf.d/default.conf
static:/wger/static:ro
media:/wger/media:ro ports:
"8001:80" healthcheck: test: service nginx status interval: 10s timeout: 5s retries: 5 restart: unless-stopped

db: image: postgres:15-alpine container_name: wger_db environment:
POSTGRES_USER=wger
POSTGRES_PASSWORD=wger
POSTGRES_DB=wger volumes:
/volume1/docker/wger/postgres-data:/var/lib/postgresql/data/ expose:
5432 healthcheck: test: pg_isready -U wger interval: 10s timeout: 5s retries: 5 restart: unless-stopped

cache: image: redis container_name: wger_cache expose:
6379 volumes:
redis-data:/data healthcheck: test: redis-cli ping interval: 10s timeout: 5s retries: 5 restart: unless-stopped

celery_worker: image: wger/server:latest container_name: wger_celery_worker command: /start-worker env_file:
/volume1/docker/wger/config/prod.env volumes:
media:/home/wger/media depends_on: web: condition: service_healthy healthcheck: test: celery -A wger inspect ping interval: 10s timeout: 5s retries: 5

celery_beat: image: wger/server:latest container_name: wger_celery_beat command: /start-beat volumes:
celery-beat:/home/wger/beat/ env_file:
/volume1/docker/wger/config/prod.env depends_on: celery_worker: condition: service_healthy

volumes: postgres-data: celery-beat: static: media: redis-data:

networks: default: name: wger_network`

Now I can find the db volume, and it holds files and folders! So that is some progress. Now the website looks like this, and I think this is more familiar to you:

I will check whether all features work another time, maybe tonight and update you. However, where should I look to really know it all works according to plan?

rolandgeider commented 1 year ago

the volumes are handled by docker and are stored... somewhere, but solve a lot problems with things like permissions etc. You can inspect a volume with docker volume inspect <name> if you want to know where the actual files are stored. But mapping folders manually should work as well.

You can download the exercise images with docker compose exec web python3 manage.py download-exercise-images and see if they appear (I'm not sure if we did fix the issue with the cache, they might need some time to show up), but if you can see those and the rest seems to work, you should be good to go

goodnewz commented 7 months ago

@clone3448 I had a similar problem. It turns out the first time wger starts, it does some extra setup things that require a bit more time. If it does not finish within the healthcheck interval of 5*10s, it fails the healthcheck with state unhealty. Docker provides an option for such a situation called start_period. All you do is add start_period: 300s to the healthcheck: section of the web container, and Bob is your uncle.

rolandgeider commented 7 months ago

FYI the PR with the start period is merged, hopefully this fixes it

greenbagels commented 6 months ago

@clone3448 I had a similar problem. It turns out the first time wger starts, it does some extra setup things that require a bit more time. If it does not finish within the healthcheck interval of 5*10s, it fails the healthcheck with state unhealty. Docker provides an option for such a situation called start_period. All you do is add start_period: 300s to the healthcheck: section of the web container, and Bob is your uncle.

Hi, just curious: you mentioned you need to add that start_period option to the web container, but your PR doesn't (it adds it to the nginx container). Is this intentional?

goodnewz commented 5 months ago

Thanks @greenbagels. You are correct. We should also add a start_period timer for the web container. In my config, I didn't have the NGINX block so I copied the part into the wrong section. I will open a new PR to address this. Great catch.

rolandgeider commented 4 months ago

the second PR is also merged now, closing here

wger-project / docker

dependency failed to start: container for service "web" is unhealthy #67