Dockerize legacy projects

@jmurty @sam-mi Could you take a quick look at the changes in this branch? Notably:

The use of django-split-settings and refactoring settings into many smaller modules that we can optionally combine as required by each project. Is this easier to configure and understand than the current multi-layered import setup?
The updated docs, explaining how to break down and combine settings modules, and how to dockerize an existing project.

The default settings modules that get combined are intended to be 1) a safe default; 2) deal with the most common requirements arising in an ephemeral containerised environment.

Additional settings modules (not used by default) are commonly used, but not strictly required to avoid an issue we might encounter in an ephemeral containerised environment. This is to avoid having to make more changes than necessary when dockerizing existing projects, while still making it easy to opt-in if we choose.

There is a django-icekit feature branch that uses ixc-django-docker instead of the scripts and project settings in ICEkit itself, but it hasn't been updated to work with this branch of ixc-django-docker.

I had some thoughts but have not yet tested how we might also make ICEkit specific settings modular and configurable. Basically, for an ICEkit project, alter DJANGO_SETTINGS_MODULE to point to an ICEkit version of ixc_django_docker/settings/__init__.py which uses what we can from ixc-django-docker's base settings, and allows the project to configure additional ICEkit settings modules.

Hi @mrmachine I have some feedback about the split settings. I have been thinking about this in the abstract for a while, so there's a high risk of bike-shedding here, but here are my thoughts.

I think defining setting files and ordering with envars will be confusing and fragile. What if you get the ordering wrong?
Could we have an initial environ.py settings file that reads all/any available environment variables. It would also make a convenient place to put tech documentation and notes, and generally centralise potentially messy lookups in one place
I would prefer more explicit enabling/disabling of features than just including settings module names (or not)
Can we consolidate the base and calculated modules into one?
I'd like site environment settings, and for optional features, to have their own subdirectories

Here is what I envisage (possibly naively)

In settings/init.py:

# Start by loading all known environment variables in one place
# (base, then optional project/local ones).
include(
    'envvars.py',
    optional(join(PROJECT_SETTINGS_DIR, 'envvars.py')),
    # Local version would probably only ever forcibly override actual envvars
    optional(join(PROJECT_SETTINGS_DIR, 'local', 'envvars.py')),
)

# Define or calculate basic Django settings (replaces 'base.py')
include(
    'django.py',
    optional(join(PROJECT_SETTINGS_DIR, 'django.py')),
    optional(join(PROJECT_SETTINGS_DIR, 'local', 'django.py')),
)

# Load site environment settings next (base, then overrides)
# by loading anything in a directory path named to match
# BASE_SETTINGS_MODULE. All are optional to permit arbitrary
# site environment names, and we use glob filename matching
# to permit project to split settings as desired.
include(
    optional(join('_' + BASE_SETTINGS_MODULE, '*.py')),
    optional(join(PROJECT_SETTINGS_DIR, '_' + BASE_SETTINGS_MODULE, '*.py')),
    # No need for a 'local' variant, use e.g. project/settings/develop/local.py
)

# Conditionally apply settings for optional features, where
# GK_ENABLE_xyz feature flags are set via envvars and/or site
# environment settings.
# Notes:
# - feature configs expect key settings to exist already, and
#   raise useful error messages if not
# - group settings into 'features' subdir in ixc-django-docker
# - possible to completely override settings in project with a
#   clean slate by setting GK_ENABLE_xyz=False and defining
#   everything in project-specific settings file e.g:
#   project/settings/develop/compressor.py

settings.GK_ENABLE_COMPRESSOR and include(
    'features/compressor.py',
    optional(join(PROJECT_SETTINGS_DIR, 'compressor.py')),
    optional(join(PROJECT_SETTINGS_DIR, 'local', 'compressor.py')),
)

settings.GK_ENABLE_WHITENOISE and include(
    'features/whitenoise.py',
    optional(join(PROJECT_SETTINGS_DIR, 'whitenoise.py')),
    optional(join(PROJECT_SETTINGS_DIR, 'local', 'whitenoise.py')),
)

This would lead to a _ixc_djangodocker/settings/ directory hierarchy something like this:

settings/__init__.py
settings/_develop/django.py
settings/_production/storages.py
settings/_staging/email.py
settings/_test/django.py
settings/envvars.py
settings/django.py
settings/features/compressor.py
settings/features/whitenoise.py

And a project settings directory hierarchy like:

project/settings/_production/email.py
project/settings/_production/storages.py
project/settings/_staging/email.py
project/settings/_test/dummy_services.py
project/settings/django.py
project/settings/envvars.py  (extra project-specific envvars)

@mrmachine Also, I made a management command to dump all a site's settings into a document that can be easily compared over time or between different environments, as a first sanity-retaining step toward changing ICEkit's settings. This command might make more sense in this project?

https://github.com/ic-labs/django-icekit/commit/a986477b40169cc9a2a0abcb0942c1a11aba41e0

@jmurty Thanks for this feedback. This looks very flexible and well structured, but I wonder if it's overkill, at least for ixc-django-docker?

The purpose of ixc-django-docker (in my mind) is to do just enough to run a Django project with or without Docker locally, or in Docker Cloud.

I want ICEkit to use it so we can improve the way we host ICEkit sites without having to upgrade the version of ICEkit they are running, but I also want it to be quite un-opinionated about what a project must do to use it, except for things that are really required when deployed to scaleable ephemeral infrastructure (e.g. compressor, whitenoise, remote storage).

Ideally, we should be able to start with safe django settings, plus required settings (compressor, whitenoise, remote storage, etc.), and then pull in whatever project settings already happen to exist, wherever they happen to exist, and just tweak those existing project settings a little to avoid conflicting with or undoing the safe default and other required base settings. It can be a significant effort to refactor existing legacy project settings (hundreds of lines) into a specific modular architecture.

I am worried that conflating a mechanism to merely load settings modules with a mechanism to enable features will more tightly couple the project settings to the architecture of and the features configured by ixc-django-docker. I think I could be persuaded that something like this should be in django-icekit, though, where we will see less variety in the projects.

Can you elaborate on the risks you see with specifying settings modules in an environment variable and loading them in that order?

What I like about this approach is that project's have absolute freedom to load whatever settings they need, from anywhere really, in whatever order they need. Legacy projects don't even need a project_settings module or a settings package in a pre-defined location. Individual projects could include a settings module that itself imposes a more rigid structure, like the one you are proposing. E.g. an ICEkit project could load join(ICEKIT_DIR, 'settings', '__init__.py') which behaves as you describe.

Can you also give an example of what you see going in envvars.py in ixc-django-docker and in projects that use it? I'm cautious about defining environment variables in a Python settings module, because I've found a few times now that actually, I need access to that environment variable in shell scripts, too. We already have .env.* files for populating the shell with environment variables?

I think that in many cases, having multiple subdirectories of settings as the prescribed architecture will be overkill and many projects will only really need one or two settings files to override. Those settings files should be able to optionally configure features based on environment variables as they do now (e.g. pull in credentials).

To answer your question about calculated.py, possibly. It currently only does one thing -- create directories that might not exist at runtime. It's a separate file that is loaded near the end because other settings modules may be loaded after the base settings which alter settings that specify paths that need to exist.

Hi @mrmachine here's my delayed follow-up:

I agree my proposal is probably overkill for just ixc-django-docker especially the highly customisable feature loading. It could be simplified in this project, though I still like the overall structure I proposed
I am very nervous about specifying settings module names in an environment variable, and am strongly in favour of using explicit and obvious truthy envvar names like GK_ENABLE_COMPRESSOR. Potential issues I can think of for allowing arbitrary setting module names set via an envvar:
- we have no control over the ordering the settings modules will be loaded, and ordering can matter a lot for these kinds of settings
- since the feature settings module names are optional, a typo in a module name in the envvar could cause a feature intended to be enabled to silently fail to be applied
- setting a list of things in an envvar can be fiddly and error prone in general
- unless we do some magic with the module names, settings modules must be top-level and cannot be nested (so we would actually be enforcing a flat settings module hierarchy on projects)
my intent with the envars.py module was only to load settings from the environment into Django settings, with perhaps some type-coercion (booleans, integers) and sanity checks but nothing more. Basically, I would like to have one place to look for all the environment variables that are actually used as settings, especially for those that get renamed or coerced etc. Having a bunch of envvar lookups strewn through multiple settings files is a bad situation I would love to avoid, by convention at least. I think doing this would give us a better chance of spotting mismatches between the contents of .env* files and the actual settings used by Django.
- I can also see the merit of arguing that environment variables should be loaded close to where they are applied, but I worry that we are going to end up repeatedly having to grep across multiple repository layers (ixc-django-docker, django-icekit, someproject) to debug issues where an envvar is not loaded as expected, or the name changes moving from the envvar to the setting, or similar minor bugs that will be more likely to occur and more difficult to debug unless we take steps to keep things sane.
overall, I would be happy to radically simplify the loading of feature-specific settings by dropping out the project- and local-overrides, but stand by the rest of what I proposed.

You may need to help us find our way out of the bike shed here @sam-mi

@mrmachine It would be great it we could move forward with this PR to make it easier to set up projects in general – GLAMkit or otherwise – with dockerised environments. I am still wary of the envvar-defined settings, but not enough to so further delay merging this

@jmurty thanks for this feedback. I've just reviewed the PR again to refresh my memory. Here are my thoughts:

While reviewing the PR, I personally found it refreshing and easy to review small single-purpose settings modules, as opposed to a section within one massive settings module fenced off with comments. You can also see at a glance what sort of optional settings are available, without reading through one long file or maintaining a separate list in documentation. With the latter (monolithic settings) approach, I have in the past found it far to easy to put settings required for one optional app anywhere in the file, which then makes it almost impossible for other developers to really know "are these all the settings that are required for this app?" This becomes even easier still as more and more people touch the file. I think separate settings app-specific modules will encourage us to keep things modular and properly grouped and to keep apps decoupled/optional, and it should be easier to spot a "compressor.py" file than to grep a several hundred line long settings module for a "compressor" comment.
You're right that ordering can matter a lot. But I think we're only talking about the ordering of optional settings modules provided by ixc-django-docker, and besides base.py and calculated.py, I don't think ordering does matter (off the top of my head -- maybe celery-email and post-office). IF ordering did matter for a particular settings module, perhaps we could implement that logic either in ixc_django_docker/settings/__init__.py (because the list of available settings modules provided by ixc-django-docker is fixed and known), or in the settings modules themselves. E.g. if whitenoise depended on compressor, just check for the existence of a compressor setting in the whitenoise module, or check that 'compressor' in INSTALLED_APPS, or explicitly set GK_ENABLE_COMPRESSOR = True in compressor.py and check for that in whitenoise.py? Outside of ixc-django-docker, individual projects (including ICEkit) will be free to include one monolithic settings module and use environ variables to enable features if they want.
We could avoid the scenario where an optional module name typo results in silent failure to apply settings by making all specified settings required. We could then iterate over the specified require settings and attempt to load a *_local version of each (with optional())?
Yes, it can be annoying to set a list of values in an env var. I don't have any better suggestion to work around that issue.
I don't think we are enforcing a flat hierarchy on projects. Can't we just specify settings/foo.py in PROJECT_SETTINGS_MODULES if a project provides a lot of optional settings modules and we don't want them all at the root level? This means we have to repeat settings/ prefix over and over in the envvar, but I suspect that most projects will actually not bother with a huge number (or any) modular settings. Only ixc-django-docker and perhaps django-icekit, as libraries, will generally need to break settings down into optional modules? And in those cases, we can hard code the prefix so that only the actual module name is required. At the end of the day, projects themselves can still do whatever they want, either by completely bypassing the ixc-django-docker settings module/loading, or by augmenting it once the project settings are loaded by ixc-django-docker.
I can see the appeal of loading env vars in one place and coercing their type as appropriate, but I think there are also some tradeoffs to consider. For nested dict settings (like DATABASES) we would need to define the nested dict structure in envvars.py to populate the env var setting, and then in the base settings module, update the various nested dicts to provide the rest of the settings. This can be messy and ugly. When adding a new group of settings and one/some coming from envvars, you need to add in two places. When viewing an app-specific settings module you might not see or realise that it has other settings that are being provided by the environment at all, so you'll always need to check both places. I suppose both of these issues could be resolved by ensuring that all env var settings are assigned to top level variables (like IDK_DATABASES_DEFAULT_PASSWORD) instead of the actual DATABASES setting, and then referencing that variable in the module where DATABASES is defined. But that seems like unnecessary double handling. I don't feel too strongly about this one either way, but if we do add envvars.py to the list, it should be required for ixc-django-docker settings and optional for project settings (so we can dockerize existing projects with as few changes as possible).

Does any of this alleviate your concerns?

@mrmachine Thanks for taking the time to weigh the pros and cons but at this stage I say just merge the PR – assuming we can do so without breaking projects that already use it like https://github.com/ixc/iwc-forum/ – and see if including optional settings modules turns out to be a problem in reality rather than just my imagination.

ixc / ixc-django-docker

Dockerize legacy projects #1