Development Environment Roadmap

I've written up past, ongoing and future Hypothesis devenv developments as a list of problems to solve, in roughly priority/sequence order, in this EPIC issue. This probably doesn't belong in product-backlog but I'm putting it here for now for lack of a better place. After writing this up I realised it would probably work better as a product board, but for now this will do.

Incrementalism

The development environment is a special kind of project in that it should never be the primary focus of the team, or even of one member of the team. We spend most of our time delivering features and other improvements to our users, and aim to spend just a little time per sprint improving our development environment. It's therefore important to design development environment improvements so that they can be delivered in small steps, and without disrupting ongoing development work.

Overall Goals

Some broad requirements that the task lists below aim to meet:

All our apps need to be running a recent version of Python.

And we need to be regularly updating all of our apps to newer versions of Python as they come out.

This is just part of the cost of doing business, nowadays, as a Python shop: we can't just stay on Python 2.7 forever, as we have done with h and via up until now, because Python 2.7 is no longer supported after 2019. lms and bouncer can't just stay on Python 3.6 either -- it's no longer supported after 2021.
All our apps need to be running recent versions of all Python dependencies.

And keeping those dependencies up to date over time.

We can't just dig ourselves into a deeper and deeper hole, as we have done in the past, by not updating dependencies. When a new version of some dependency comes out with an important security fix, bug fix, or new feature, or we want to add a new dependency that requires newer versions of existing dependencies, then we end up having to do a multi-month massive upgrade project of multiple dependencies.
We want to break things up into lots of separate, small apps and projects.

This is a direction we've been going in for some time (we already have h, bouncer, lms, client and via) and we all seem to agree on. Our devenv set up and processes are going to need to scale to managing many separate projects.
We need to be sharing Python code between apps.

We have four production Python apps now (h, lms, via and bouncer) as well as a couple of test/demo apps, and are planning to do more separate apps. So our number of separate apps is going to be increasing over time, and we're therefore going to need to share code between them, as the cost of duplicating code will become too high.

We need to move from being a shop that releases Python apps, to becoming a shop that releases Python apps and publishes open source Python libs. And we need the devenv and CI tooling and processes to enable that.
We want our devenv to become simpler and easier to use over time.

It's already pretty good, so this one is not urgent, and can be worked on bit-by-bit over time, but we always want to be making the developer experience better. Also, as we grow more and more separate apps and libs, this introduces new challenges: just manually installing and running each individual app in its own tab/window, and setting them all up manually, becomes less practical.

Overall Approach (so far)

This doesn't have to remain set in stone, but for context the tenets of the approach that we've taken so far, and it seems to be working well for us, have been:

Combine small "do one thing and do it well" tools

"Microservices" or "UNIX philosophy"-type tools, rather than monolithic tools that aim to do everything in one tool.

tox just manages venvs. pyenv just installs versions of Python. Honcho just multiplexes processes.
Use traditional tools

For example we use standard Python virtual environments.
We use tox (which while not part of the standard library is a popular and venerable tool) to manage venvs.
We download and install Python versions from source, but use pyenv (again, very popular among Python developers) to automate this.
We use ancient UNIX things like GNU Make and a little shell scripting (not too much)
Use lightweight tools that are easier to understand

In particular we've so far mostly stayed away from any virtual machines or containers (except for using Docker Compose to run services -- you can't beat it for that).

It's simpler to just run our Python and node processes locally on the host machine, without the additional complexity and problem potential of a virtualization or containerization layer and CLI between the developer and the process they're hacking on or debugging.

To isolate things from the host we use pyenv and venvs, both of which are just copies of Python sitting in directories on your filesystem and so a lot easier to reason about.

Docker has so far seemed to be overkill for our purposes and by introducing an additional layer of conceptual and technical abstraction for our developers to learn, its cons may outweight its pros.

So far this is working well for us. But using containers in development does have potential advantages (e.g. reducing differences between dev machines) so we may revisit this one day.

The Roadmap

Below are checklists of the specific problems we need to solve in roughly priority order.

We can create individual issues for each of these as we decide to begin work on it. For now I've just made a checklist.

Functional Requirements

There's some functionality that our devenv needs to have in order to be able to deliver what we intend to deliver.

[x] Problem: our projects all use different versions of Python, and some use old versions

Bouncer and lms use Python 3.6, whereas h and via use Python 2.7, and so on.

Solution (in progress): upgrade h to Python 3.6, upgrade via to Python 3.6 or replace via with something else. Put in place a good system for regularly upgrading all our projects to newer versions of Python in the future (once they're all on 3.6, then 3.7 will be the first test). Requires good Python version management in development (see below).
[x] Problem: out-of-date Python dependency versions, and different apps using different versions

Solution (in progress): we've added Dependabot for Python dependencies to bouncer and lms, and we're close to enabling it for h too (just a few difficult big dependency updates to do first). Dependabot is a scalable way to keep dependencies up to date across apps.

No Dependabot for via yet and don't know what we're going to about via.
[x] Problem: duplicate Python code between apps

We've ended up duplicating the same Python code in multiple apps. For example to implement Sentry filtering, and feature flags, and much else. Duplicate code duplicates maintenance and development work.

Solution: let's start extracting code like this into standalone, open source Python libraries (many of them likely Pyramid extensions) that're shared between our apps. These libraries will use semantic versioning, publish releases to PyPI, and Dependabot will send PRs to all our apps to update them to new versions. These libs will also have to support multiple versions of Python at once (so we'll need to run the tests in multiple versions of Python) and they may even sometimes have to support multiple versions of certain Python dependencies (so we may need to use tox's generative envlists).

This is a new thing for us: we haven't published a Python library rather than an app before. But I think our devenv can adapt to it and remain consistent across app and lib projects.

Let's check off this item once we've published our first library.
[ ] Problem: duplicate devenv tooling

A lot of tooling is duplicated between our apps and maintaining duplicate copies leads to duplicated maintenance work and inconsistencies between the devenvs of different projects.

For example: linter config files, Jenkinsfile, Dockerfile, Makefile, tox.ini, and much else.

Many of these files can't be exactly the same in different projects but they should be very similar and follow the same format. So some form of templating may be required in the solution.

Cookiecutter might work?

Usability

This is a checklist of problems with the developer usability of our devenv. Basically, problems with the process of installing, running, and hacking on our apps in development. See h install instructions circa March 2017 compared to today.

[x] Problem: developers find creating and activating venvs difficult

The dev env used to require developers to manually create and activate Python virtualenvs for each project. This was error prone, confused developers and caused problems with broken devenvs.

Solution: creating and activating venvs was automated by using tox.
[x] Problem: developers find installing and upgrading dependencies difficult

The devenv used to require developers to run pip in order to install the app's requirements into its venv, and to re-do this whenever the requirements change. This would lead to problems such as broken devenvs due to out of date requirements.

Solution: tox now automatically installs the requirements, and updates them whenever the requirements change. We use the tox-pip-extensions plugin so that tox uses venv-update to make updating installs as fast as possible: it only installs, updates and removes the packages that it needs to, it caches package downloads, it uses binary packages, etc.

npm and Makefile similarly automate installing and upgrading JavaScript deps.
[x] Problem: isolate devenvs from developer's systems

Differences between developers' systems causes confusion and broken devenvs.

Solution: tox has been used to isolate our projects from the system's Python packages, envvars, and PATH. We're in the process of rolling out pyenv to all projects to isolate them from the system's copies of Python as well.
[x] Problem: dependency and Python version conflicts between different devenv commands within a single project

We ran into a number of issues where, for example, PyLint or Sphinx had a dependency requirement that conflicted with a requirement of our app and so the linter or docs build was broken because it couldn't be installed in the development venv. Similarly, h and via had a problem that Black required Python 3 but h and via were still on Python 2 so you couldn't install Black in their devenvs.

Solution: tox was used to isolate different devenv commands within the same project from each other. The linter and its requirements are installing in one venv, the code formatter and its requirements in another, and so on. And these venvs needn't use the same version of Python.
[x] Problem: inconsistent devenv commands

Lots of different commands are needed to do various things in devenvs. These commands are all different from each other, and also commands to do the same thing vary from one project to the next: docker and docker-compose commands, the bin/hypothesis command, alembic commands, pip and pip-compile commands, linters, test runners, coverage, building the docs.

Solution: shortcuts for all common dev commands were added to the Makefile, this Makefile was made consistent across all projects, and a make help command was added to document all of the commands.
[x] Problem: installing and launching services

Developers have to install several services (PostgreSQL, Elasticsearch, and RabbitMQ) (for example by installing each in Docker) and then run each service (for example each service in its own shell) before running an app like h. This requires running a lot of commands and opening a lot of shells, and the documented method for installing and running services differs between projects.

Solution: all services were moved into Docker Compose, and this was applied consistently across all apps that require services. The docker-compose command was also wrapped in make services. So now, for each app that requires services, you just need to run make services to start them all in the background (and even that could be automated, see below).
[x] Problem: CI is too slow

It takes too long for both Travis and Jenkins to finish running. Means developers are waiting around, picking their noses.

Solution (in progress): move everything from Travis to Jenkins, stop using Travis. Run all tasks (tests, lint, formatting, ...) in parallel on Jenkins. Use Jenkins build slaves with a lot of performance.
[x] Problem: weak Python linting, and inconsistent across projects

Some of our projects use Flake8, which is a very weak linter that doesn't detect much. Other projects use Prospector which is much stronger but too slow.

Solution (in progress): change all our projects to run PyLint (directly, not via Prospector) which is the best Python linter. Enforce no (new) PyLint errors on CI, for all projects. Consistent PyLint config across all projects.
[x] Problem: installing prerequisites

Too many tools and prerequisites need to be installed manually, (apt packages, brew packages, Docker and Docker Compose, gulp) making setting everything up time consuming and error prone.

Solution (in progress): the prerequisites for h were reduced to the four that were actually needed (Git, Node, Docker, and pyenv). The dev install docs for other apps still need to be updated: they shouldn't require anything outside of the four
[x] Problem: having to install Docker Compose manually

Docker Compose is a Python package and can be installed automatically.

Solution (in progress): have tox install Docker Compose and have Makefile and everything else run tox's copy. Done for h but not yet lms.
[x] Problem: having to install Gulp manually

Gulp is an npm package and can be installed automatically.

Solution (in progress): have npm install Gulp in node_modules/ and have Makefile and everything else run npm's copy.
[x] Problem: having to create test DBs manually

Both h and lms require you to run a command to create a test DB manually before the tests will work. This is silly, you shouldn't have to do this manually.

Solution (in progress): have make test and other commands that need the test DB call a script that creates it if it doesn't exist yet. Done for h but not yet for lms.
[x] Problem: diverse install instructions

Each app's install instructions are different, which makes installing all of the apps more difficult and error prone than necessary.

Solution (in progress): the h install docs have been rewritten to be as short and simple as possible. The other apps's docs should be rewritten to be the same as h's.

We'd like installing the full suite of apps to be automated (see below). Once the install procedures for each of the individual apps have been made as simple and consistent as possible, a script to automate installing the whole suite will be easier to write. Getting the documented install procedures as simple and consistent as possible is a first step.
[x] Problem: Python version management

Developers should use the exact versions of Python as required by each project, but developers are left on their own to figure out how to install multiple versions of Python on their particular OS. This leads to developers figuring out their own solutions (https://stackoverflow.com/c/hypothesis/questions/88, https://stackoverflow.com/c/hypothesis/questions/171), which leads to problems with broken Python installs, and not using the exact right versions of Python. And it's not going to scale well as we start adding more projects, that require more versions of Python, and start regularly upgrading to new versions of Python.

Solution (in progress): pyenv is a Python version manager that automates installing and activating the exact right versions of Python as required by each project and works on Windows, Linux and macOS. Crucially, pyenv enables a project to use multiple versions of Python, which we need for various reasons. pyenv support has been added to h with success, and now needs to be rolled out to all our projects. Our pyenv setup is completely automated and never requires developers to run any pyenv commands.
[ ] Problem: you shouldn't have to run make services manually

Some of our projects require you to run make services before you run make dev. It'd be simpler if you only had to run make dev, one command. This would also making it easier to write a script that boots up all our apps at once.

Solution: it's trivial to get make dev to automatically run make services first but that introduces a race condition: if the app starts up after the Postgres Docker container has been started but before Postgres has begun responding on its port, then the app crashes. This race condition can be fixed by either making the app not require Postgres at start up (more correct app behaviour, but maybe difficult to implement) or with a simple (~3 line) shell script that waits for Postgres to start responding and then starts the app (probably using nc which is installed by default on Linux and macOS).

We decided not to automate this. See https://github.com/hypothesis/lms/pull/1834#issuecomment-647593594
[x] Problem: manually creating database contents

https://github.com/hypothesis/product-backlog/issues/1045

In order to get all of our apps to work together a lot of users, groups, authclients, etc need to be created in the h and lms DBs manually. Creating all these is time consuming and error prone.

Solution: much dev data is not sensitive and should just have hard-coded dev defaults in a script that inserts the dev data into the DB. Some dev data is secret-ish, for these a simple solution might be a script that pulls them from a private GitHub repo. The end goal is that a developer should not have to create any dev data manually to get it all working. Just make dev should work.

(Maybe a new make setup command that make dev calls.)
[x] Problem: manually setting config settings (envvars)

In order to get all of our apps to work together correctly a large number of envvars need to be set correctly for each app. Setting all these is time consuming and error prone.

Solution: many envvars are not sensitive and should just have hard-coded dev defaults in tox.ini. Some envvars are secret-ish, for these a simple solution might be a script that pulls them from a private GitHub repo. The end goal is that a developer should not have to set any envvars manually to get it all working. Just make dev should work.

(Maybe a new make setup command that make dev calls.)
[ ] Problem: manually installing several different apps

Even once the apps have all been made easy to install, and their install processes and docs all consistent, you still need to install several apps to get going, and the number may increase in time.

Solution: we're close to having all of our apps depend on the same handful of prerequisites and having the install process for each app being just git clone; make dev (requires a few more of the checkboxes above to be checked, for example automating the setting of envvars and creating of dev data). Once we have that then it should be simple to write a script that installs all of the apps for you. This script could also walk you through the process of installing the prerequisites first: telling you to install each prerequisite and then press Enter, checking that each prerequisite is installed correctly (e.g. running docker-run hello-world).
[ ] Problem: having to manually launch several different apps in different windows

To get everything going you have to manually launch each app. You don't always need every app running but for example lms does need h, via and client to be running in order to work. This requires just one command per app (make dev) but you do have to open four windows/tabs and run this command four times, and then switch between all these open tabs to see what each app is doing.

Solution: we already use Honcho in h and lms to run all of the app's processes at once in one shell, and plan to extend this to the rest of our apps, but this only multiplexes the processes of one app. Once all our apps have a consistent Honcho / Procfile setup, though, it becomes trivial to write a script that reads all the Procfiles and runs them all at once in one shell. The aim is that you just have to open one window, and run one make dev command, and you have all the apps running together.
[ ] Problem: differences in behavior between dev and prod

Production and development are configured differently and these changes lead to different behavior such as different logging or in some cases different libraries being used which make it difficult to reproduce production issues locally.

hypothesis / product-backlog