openedx / edx-platform

The Open edX LMS & Studio, powering education sites around the world!
https://openedx.org
GNU Affero General Public License v3.0
7.1k stars 3.79k forks source link

Reconcile Tutor and OEP-45 #34446

Open kdmccormick opened 2 years ago

kdmccormick commented 2 years ago

Context

OEP-45 makes declarations about how Open edX services should be structured & packaged as to simplify configuration and deployment. Tutor is aligned with some of these guidelines, such as:

but is at odds with others, such as:

As the official deployment strategy and soon-to-be official developer stack, Tutor ought to be a faithful implementation of any framework that the Open edX community sets around configuration and deployment.

Acceptance

Either:

  1. change Tutor to follow OEP-45,
  2. change OEP-45 to encompass Tutor,
  3. do a little bit of both 1 and 2, or
  4. revoke OEP-45.
regisb commented 2 years ago

I think that OEP-45 makes sense for the most part, but I disagree with some of it. More specifically, defining settings in YAML files: I'm just not sure what we gain by moving values from .py modules to .yaml files. What we lose is the ability to inherit/import values from one base configuration to another. This makes it more difficult for applications to override existing behaviour. At the end of the day, we will always need python modules to override complex settings, such as LOGGING handlers, MIDDLEWARES and INSTALLED_APPS. If we need to define different yaml files for development, production and testing, then we will end up with a lot of duplication.

I would love to have better build and management scripts in the upstream repositories. For instance, I tried multiple times to get rid of the custom openedx-assets script. But sometimes the lack of convenient management scripts in the upstream repos makes it difficult to do thinks in a user-friendly way.

It would be worth it to make Tutor comply more with OEP-45, but to achieve that we will also have to make quite a few changes in the upstream repositories. Here is what the process would look like:

  1. Identify some complex/weird piece of the tutor codebase. For instance: the jobs.py module, the waffle toggle command, the ad-hoc MinIO storage class for user tasks.
  2. Push changes upstream that make these pieces of code irrelevant.
  3. Make the new feature work on Tutor nightly.

OEP-45 is based on good common sense. By making Tutor comply more with common sense, we will get closer to reconciliation with OEP-45.

jmbowman commented 2 years ago

We discussed OEP-45 again in Arch Hour this morning, as we're considering having Arbi-BOM dive into implementation soon if we still agree it's a good path forward. The topic of YAML vs. Python files came up, the stated advantages of YAML were:

That said, we also said that we should consider Python files instead if we find other reasons that make them work better in practice (which is entirely possible given that it's the Django standard). I will say that in your example of complex settings, edX/2U explicitly moved away from layered Python overrides to YAML files with some duplication because even 1-2 layers of overrides rapidly overwhelmed the ability of even senior developers to reason about how the layering worked and which settings were actually in effect.

And regarding changes in the upstream repositories...I think we're ready to start doing that now. If there are specific changes that would help simplify things immediately, feel free to suggest them for tackling early in the process.

regisb commented 2 years ago

I always say that there are only two things that are difficult about Open edX: one is static assets, and the other is settings.

While I think it's great that 2U is thinking about improving the design of LMS/CMS settings, the choice of YAML for storing settings should be just an implementation detail that is specific to 2U. If people want to store settings in YAML, JSON, TOML, environment variables, Ansible Vault or Consul, then that's great; it's not my concern, and it shouldn't be 2U's either.

The current implementation of the production.py settings forces Tutor to make use of lms.yml/cms.yml files. The alternative would be to bypass the default production setting files and to re-write production.py settings from scratch -- but that seemed too difficult to me when I started work on Tutor. So we went with minimal lms/cms.yml files.

What we should be focusing on is to provide better default settings, both in development and production. Better defaults would make everyone's lives much easier, both at 2U and outside. To understand the difficulties we have with settings, you just need to look at the implementation of the Tutor settings and wonder: why do we have to override so many values?

Let's have a look at a few examples:

  1. Source: in development, we need to override 9 LMS settings when we modify the LMS host.
  2. Source: in production, we need to modify 3 standard settings depending on whether we use SSL/TLS.
  3. Source: the default values of these common LMS settings could probably be modified for everyone.
  4. Source: when we modify the Mongodb host, we need to modify it in many different places.

In some of these examples (though not all) the common thread is that we need to load "base/required" settings and use them to infer other values. You want to load these base settings from YAML, and I'm fine with that -- but don't force this choice on everyone. Instead, provide a generic mechanism to allow anyone to load settings from whichever source they like. Here's what an actual implementation would look like in lms/common.py:

# Provide good defaults
ENABLE_SSL = False
MONGODB_HOST = "mongodb"
...

try:
    # 2U's common_private.py would load values from yaml files
    from .common_private import *
except ImportError:
    logger.warning("Private setting module could not be loaded")

# Infer other settings
if ENABLE_SSL:
    SESSION_COOKIE_SAMESITE = "None"
...
jmbowman commented 2 years ago

How about we just modify the OEP to accept more formats for the config file? It would still be specified by an <APPNAME>_CFG_PATH environment variable but could be a YAML file, a Python file on the PYTHONPATH, or perhaps even JSON, TOML, dotenv, etc. There are settings loaders in https://github.com/rochacbruno/dynaconf/tree/master/dynaconf/loaders that might be useful for this. The built-in settings file would then just look at the file extension (or lack thereof) and pick the appropriate loader for the specified file.

Most of the examples you gave feel like demonstrations of "we aren't handling derived settings well". There's https://github.com/openedx/edx-platform/blob/master/openedx/core/lib/derived.py , but not many people know about it and the resulting settings declarations look a little odd. I think the approach of loading the custom settings file twice (that I mentioned in my last comment) would probably work better. The file would only need to be parsed once, but the parsed values would be used to set/override settings both near the beginning of the core settings file (to specify any required settings like host and ENABLE_SSL which other settings depend on) and again near the end (in case you want to specify a value for a derived setting that doesn't match the default formula).

I'm not sure how you'd handle mutation of complex settings like LOGGING, though (vs. just re-declaring the whole thing). It feels like that would require a specific provision of an optional Python file to load at the end of settings parsing which would be passed the existing settings module to mutate (this is roughly what that derived.py module I mentioned above does). We could add that, but experience would lead me to slap a big "you probably don't really want to do this" disclaimer on that feature.

regisb commented 2 years ago

How about we just modify the OEP to accept more formats for the config file?

My suggestion is that OEP-45 should completely ignore the format of the config file. Django is great for loading settings coming from different sources (proof being that a tool like dynaconf exists) and Open edX should not even try to improve on that.

It would still be specified by an _CFG_PATH environment variable

To require the use of a single config file is an opinionated choice that is not just unnecessary: it's also extra work for people who want to do things differently.

There are settings loaders in dynaconf that might be useful for this.

Dynaconf should not be a mandatory tool to run any Open edX IDA. In many cases it would be overkill.

Most of the examples you gave feel like demonstrations of "we aren't handling derived settings well".

It's not so much that we don't handle derived settings well, but more that we are not providing good defaults.

We should really be focusing on better defaults. All the rest is implementation-specific.