overhangio / tutor

The Docker-based Open edX distribution designed for peace of mind
https://docs.tutor.overhang.io/
GNU Affero General Public License v3.0
914 stars 434 forks source link

Add support for running multiple Celery queues #1130

Open Ian2012 opened 1 day ago

Ian2012 commented 1 day ago

Is your feature request related to a problem? Please describe.

Open edX already has implemented solutions for running multiple celery queues such as the old known high and high_mem default queues however, tutor operates by using a single worker deployment with a default pool (prefork) and this is not always as performant as desired as that pool is designed for CPU intensive tasks such as the ones related to grading, while I/O bound tasks such as the ones used in Aspects would benefit from having a gevent pool which used green threads aka virtual threads to accomplish high levels of concurrency. This has already been tested and implemented in tutor-contrib-celery and the benefits have been notorious as the resources are better handled and I/O bound tasks are resolved pretty fast.

Describe the solution you'd like

Allow tutor users to configure multiple Celery deployments with specialized settings for each queue's tasks. With defaults matching what's currently expected to run on each queue.

Describe alternatives you've considered

Developing a separate plugin for Celery: tutor-contrib-celery however we think the community would benefit from having a standard way to set Celery and also don't need to build a custom openedx images with those requirements: https://github.com/openedx/edx-platform/pull/35591

Additional context

https://celery.school/celery-worker-pools

regisb commented 13 hours ago

Hey @Ian2012! I'm glad you are opening this issue today as there are multiple conversations going on right now about Celery:

https://github.com/overhangio/tutor/issues/1126 https://github.com/overhangio/tutor/pull/1010 https://github.com/overhangio/tutor/pull/1128

As usual, I'd like to resolve these issues by:

  1. Improving the default settings for end users.
  2. Making it possible to easily customize these default settings.

Let's address these two items in reverse order:

Customization

Let's assume that we provide a mechanism in Tutor core that makes it possible to customize the celery command exactly as you need. You are then also able to run any celery worker by adding containers to the docker-compose/k8s override files. I don't have a perfect solution to this problem just yet, but I'm working on it. Assuming this solution exists, it should enable you to customize the celery workers as you see fit, right?

There would be one remaining problem, which is the unavailability of gevent in the openedx Docker image. You propose to address this issue in your edx-platform PR. An alternative would be to add gevent to the additional packages installed by Tutor. Which solution do you like best?

Good defaults

My second concern is having good defaults for users who don't know how to tune celery workers. In your opinion, what changes should we make to the default celery command or settings to improve those defaults? For instance, in your tutor-contrib-celery plugin, I see that you set CELERY_ACKS_LATE = True (here). Do we want to add this change to tutor core? Are there other changes that we should be adding?

Ian2012 commented 10 hours ago

Customization

I've seen some comments around having a celeryconfig.py file with a patch on it for general settings, and for the custom workers we could use environment variables.

For 'gevent', both options are available, but if it's okay to have it on the tutor, then let's proceed with that.

Good defaults

Besides the CELERY_ACKS_LATE Python setting, we have explored no other Python settings, but that's something we can test and measure when the PR is open. Here are some settings we have explored:

I can open a PR with the following changes to be considered:

regisb commented 9 hours ago

I've seen some comments around having a celeryconfig.py file with a patch on it for general settings, and for the custom workers we could use environment variables.

Yes, I suggested that. But after investigating I found no way to configure a celery worker using such a settings file. As far as I know the celery worker command does not support a --config=... option: https://docs.celeryq.dev/en/stable/userguide/workers.html

Do you know a way to achieve that?

Ian2012 commented 8 hours ago

We need to use the celery app config_from_object method on the production.py settings files:

default_config = 'myproj.celeryconfig'
app.config_from_object(default_config)
Ian2012 commented 8 hours ago

It may require changes to edx-platform celery's app but I don't think we need it. We also have the option to use environment variables but python files provides more prefxility

regisb commented 8 hours ago

You are referring to this? https://docs.celeryq.dev/en/stable/reference/celery.html#celery.Celery.config_from_object This is the method that is being called in lms.celery: https://github.com/openedx/edx-platform/blob/master/lms/celery.py

As far as I know, this method is used to configure the Celery application, which is not the same thing as the Celery worker. For instance, were you able to configure the worker concurrency using this method? If yes, how?

Ian2012 commented 5 hours ago

Yes, here is the code snippet with some other settings:

image

CELERYD_CONCURRENCY = 2 # worker_concurrency
CELERYD_MAX_TASKS_PER_CHILD = 100 # worker_max_tasks_per_child
CELERYD_POOL = 'threads' # worker_pool

See https://celery-safwan.readthedocs.io/en/latest/userguide/configuration.html#new-lowercase-settings for more information on the available settings, users can also use the APP variable to directly modify the variables on the celery object.

We are still missing having separate settings for each lms-worker queue but that can be solved by injecting an environment variable and dynamically resolve the settings on a dictionary:

CELERY_WORKER_TYPE = os.environ.get("CELERY_WORKER_TYPE", "default")

worker_settings = {
    "lms": {
        "default": {
            "parameters": {
                "worker_concurrency": 4,
                "worker_pool": "threads"
            }
        },
        "high": {},
        "high_mem": {},
    },
    "cms": {
        "default": {},
        "low": {},
    },
}
from openedx.core.lib.celery import APP

worker_variants = worker_settings.get(SERVICE_VARIANT)
for variant, config in worker_variants.items():
    if CELERY_WORKER_TYPE == variant:
        for parameter, value in config.get("parameters", {}).items():
            conf = APP.conf
            setattr(conf, parameter, value)

This is working locally

Ian2012 commented 5 hours ago

Some parameters cannot be added this way, like the queues but can be added to the command line args. But if we are going to add command line arguments and everything is configurable using the pattern above then we don't need this script, just inject every setting via the filter and allow operators to override it

Ian2012 commented 39 minutes ago

This is the POC: https://github.com/overhangio/tutor/pull/1131