revsys / django-health-check

a pluggable app that runs a full check on the deployment, using a number of plugins to check e.g. database, queue server, celery processes, etc.
https://readthedocs.org/projects/django-health-check/
MIT License
1.22k stars 191 forks source link

Connection refused on Celery health checks #423

Open lvieirajr opened 6 months ago

lvieirajr commented 6 months ago

Getting a connection refused error when trying to send either a ping or a task to Celery via the health check. But creating a simple view that directly invokes ping on my celery works fine.

Python 3.12.2 django==5.03 django-redis==5.4.0 celery==5.3.6 amqp==5.2.0 kombu==5.3.5 billiard==4.2.0 redis==5.0.3 hiredis==2.3.2

Screenshot 2024-03-14 at 4 17 52 PM django | 2024-03-14T23:09:43.198232Z [info ] request_started [django_structlog.middlewares.request] ip=192.168.65.1 request=GET /test/ request_id=19693212-3ffe-49c7-a3ad-af8913676e64 user_agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 user_id=None django | [{'celery@efa73fb42c24': {'ok': 'pong'}}] django | 2024-03-14T23:09:44.255825Z [info ] request_finished [django_structlog.middlewares.request] code=200 ip=192.168.65.1 request=GET /test/ request_id=19693212-3ffe-49c7-a3ad-af8913676e64 user_id=None

Screenshot 2024-03-14 at 4 14 48 PM

unavailable: Unknown error django | Traceback (most recent call last): django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 472, in _reraise_as_library_errors django | yield django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 459, in _ensure_connection django | return retry_over_time( django | ^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/utils/functional.py", line 318, in retry_over_time django | return fun(args, kwargs) django | ^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 934, in _connection_factory django | self._connection = self._establish_connection() django | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 860, in _establish_connection django | conn = self.transport.establish_connection() django | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/transport/pyamqp.py", line 203, in establish_connection django | conn.connect() django | File "/usr/local/lib/python3.12/site-packages/amqp/connection.py", line 324, in connect django | self.transport.connect() django | File "/usr/local/lib/python3.12/site-packages/amqp/transport.py", line 129, in connect django | self._connect(self.host, self.port, self.connect_timeout) django | File "/usr/local/lib/python3.12/site-packages/amqp/transport.py", line 184, in _connect django | self.sock.connect(sa) django | ConnectionRefusedError: [Errno 111] Connection refused django | django | The above exception was the direct cause of the following exception: django | django | Traceback (most recent call last): django | File "/usr/local/lib/python3.12/site-packages/health_check/contrib/celery_ping/backends.py", line 15, in check_status django | ping_result = app.control.ping(timeout=timeout) django | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/celery/app/control.py", line 563, in ping django | return self.broadcast( django | ^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/celery/app/control.py", line 776, in broadcast django | return self.mailbox(conn)._broadcast( django | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/pidbox.py", line 330, in _broadcast django | chan = channel or self.connection.default_channel django | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 953, in default_channel django | self._ensure_connection(conn_opts) django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 458, in _ensure_connection django | with ctx(): django | File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit django | self.gen.throw(value) django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 476, in _reraise_as_library_errors django | raise ConnectionError(str(exc)) from exc django | kombu.exceptions.OperationalError: [Errno 111] Connection refused django | unavailable: Unknown error django | Traceback (most recent call last): django | File "/usr/local/lib/python3.12/site-packages/kombu/utils/functional.py", line 32, in call django | return self.value django | ^^^^^^^^^^^^^^ django | AttributeError: 'ChannelPromise' object has no attribute 'value'. Did you mean: 'call'? django | django | During handling of the above exception, another exception occurred: django | django | Traceback (most recent call last): django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 472, in _reraise_as_library_errors django | yield django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 459, in _ensure_connection django | return retry_over_time( django | ^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/utils/functional.py", line 318, in retry_over_time django | return fun(args, kwargs) django | ^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 934, in _connection_factory django | self._connection = self._establish_connection() django | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 860, in _establish_connection django | conn = self.transport.establish_connection() django | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/transport/pyamqp.py", line 203, in establish_connection django | conn.connect() django | File "/usr/local/lib/python3.12/site-packages/amqp/connection.py", line 324, in connect django | self.transport.connect() django | File "/usr/local/lib/python3.12/site-packages/amqp/transport.py", line 129, in connect django | self._connect(self.host, self.port, self.connect_timeout) django | File "/usr/local/lib/python3.12/site-packages/amqp/transport.py", line 184, in _connect django | self.sock.connect(sa) django | ConnectionRefusedError: [Errno 111] Connection refused django | django | The above exception was the direct cause of the following exception: django | django | Traceback (most recent call last): django | File "/usr/local/lib/python3.12/site-packages/health_check/contrib/celery/backends.py", line 17, in check_status django | result = add.apply_async( django | ^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/sentry_sdk/integrations/celery.py", line 228, in apply_async django | return f(args, kwargs) django | ^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/celery/app/task.py", line 594, in apply_async django | return app.send_task( django | ^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/celery/app/base.py", line 799, in send_task django | amqp.send_task_message(P, name, message, options) django | File "/usr/local/lib/python3.12/site-packages/celery/app/amqp.py", line 518, in send_task_message django | ret = producer.publish( django | ^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/messaging.py", line 186, in publish django | return _publish( django | ^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 556, in _ensured django | return fun(args, kwargs) django | ^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/messaging.py", line 195, in _publish django | channel = self.channel django | ^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/messaging.py", line 218, in _get_channel django | channel = self._channel = channel() django | ^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/utils/functional.py", line 34, in call django | value = self.value = self.contract() django | ^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/messaging.py", line 234, in django | channel = ChannelPromise(lambda: connection.default_channel) django | ^^^^^^^^^^^^^^^^^^^^^^^^^^ django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 953, in default_channel django | self._ensure_connection(**conn_opts) django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 458, in _ensure_connection django | with ctx(): django | File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit django | self.gen.throw(value) django | File "/usr/local/lib/python3.12/site-packages/kombu/connection.py", line 476, in _reraise_as_library_errors django | raise ConnectionError(str(exc)) from exc django | kombu.exceptions.OperationalError: [Errno 111] Connection refused

jacklinke commented 5 months ago

Does RabitMQ or Redis use credentials in your installation? If so, did you apply the credentials appropriately via BROKER_URL or REDIS_URL?

The test view you created uses the django's settings, which already have the password applied (since your test is working), and app.celery has access to it since it uses django's settings. But django-health-check does not use the cache setting by default. You need to explicitly add the password to the appropriate URL setting as noted in the readme.

For Redis, that looks something like: "redis://:S3cr3tP4$$W0rd@100.123.123.100:6370/0"

lvieirajr commented 5 months ago

No, Im testing with redis running locally from my docker compose environment and redis has no password.

REDIS_URL = env.str("REDIS_URL")
CELERY_BROKER_URL = REDIS_URL
REDIS_URL="redis://redis:6379/0"
jacklinke commented 5 months ago

Bummer. Still, you did set the REDIS_URL setting, correct?

(Looking to see if there's anything else I had to do in my installation of this app...)

lvieirajr commented 5 months ago

Yeah the URL is set correctly, as the broker url just references the redis url as you can see on my previous comment.

jacklinke commented 5 months ago

Just looked through the commit from adding to my system. I didn't have to do anything other that what's in the readme and what you already mentioned you have done.

Do you have the latest celery and kombu?


Not ideal, but a workaround might be to simply turn the ping test view you provided above into a custom check. Adding a custom check turns out to be incredibly simple. I added this one to check that there have been at least 1 celery task completed in the past hour (required django-celery-results). Again, not as ideal as just activating the package's built-in celery test, but hopefully you'll see that modifying your test into an equivalent of the built-in is almost trivial.


healthchecks.py

from celery import states
from django.utils import timezone
from django.utils.translation import gettext as _
from django_celery_results.models import TaskResult
from health_check.backends import BaseHealthCheckBackend
from health_check.exceptions import HealthCheckException

class RecentCeleryStatusHealthCheckBackend(BaseHealthCheckBackend):
    critical_service = True

    def check_status(self):
        # Query to check if any instances of TaskStatus exist where status == states.SUCCESS and date_done is within the past hour.
        if not (
            TaskResult.objects.filter(status=states.SUCCESS)
            .filter(date_done__gte=timezone.now() - timezone.timedelta(hours=1))
            .exists()
        ):
            raise HealthCheckException("No successful Celery tasks in the past hour.")

    def identifier(self):
        return _("Recent Celery Status")

apps.py

from django.apps import AppConfig
from health_check.plugins import plugin_dir

class CommonAppConfig(AppConfig):
    name = "apps.common"

    def ready(self):
        from apps.common.health_checks import RecentCeleryStatusHealthCheckBackend

        plugin_dir.register(RecentCeleryStatusHealthCheckBackend)
jacklinke commented 5 months ago

Here are three issues on kombu that seem quite similar to yours. Maybe they will provide some insight.

Wish I had more to offer.