Update use of Docker - Githubissues

rstata commented 6 years ago

Limbo's use of Docker has become a bit stale. This pull request contains a refresh.

The goal of the use of Docker in this PR is two-fold:

Minimize the dev environment required for Limbo. In particular, developers only need Git and Docker on their laptop: with this patch, everything needed to build, test, and run Limbo is done inside Docker containers. This minimized dev environment is great for teaching environments, where misconfigured laptops don't waste the time of students, professors or TAs. Also, developers new to Limbo might start by using the Dockerized build and testing, avoiding problems associated with setting up a dev environment until they've become somewhat proficient at Limbo development.
Produce a Limbo Docker image that can be used for cloud deployments of Limbo (whether on Amazon, Google, Microsoft, or some private cloud).

This PR uses Docker Compose to drive Docker. Compose is a layer on top of Docker that moves what used to be a crazy number of command-line arguments to docker into Docker Compose configuration files. Docker Compose files have become a de facto standard input to container orchestration systems, as they are accepted by Kubernetes, Elastic Container Service, and Mesos, as well as by Docker Swarm.

This PR includes changes to .travis.yml for testing of the new Docker code. In particular, we use Travis build stages to run the Dockerized build-and-test if (and only if) the full Limbo "test matrix" successfully passes.

llimllib commented 6 years ago

Some questions:

Why use docker compose for just one service?
Why not build the docker image off of the python base? Here's what I have on another (modified) limbo project:

FROM python:3.6.2-alpine3.6

# needed for crypto
RUN apk add --update openssl-dev

ENV PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

ADD . /app
WORKDIR /app
RUN pip install -r requirements.txt
RUN python setup.py install
CMD ["bin/limbo"]

Is there any advantage to the larger dockerfile?

rstata commented 6 years ago

Why use docker compose for just one service?

The handling of environment variables makes the edit-compile-test loop much more convenient: I don't have to type -e arguments over and over. Also, the ability to set entry points in docker-compose.override.yml for things like testing (and, in my environment, pushing to ECS) is s benefit for pushing all the build, test, and deploy execution into Docker containers. This use of docker-compose.override.yml for developer commands and for otherwise capturing environmental differences is becoming idiomatic. More generally, if you remember the old Git "plumbing/porcelain" distinction, I'd say docker is becoming the "plumbing" of Docker while docker-compose is becoming the porcelain.

Why not build the docker image off of the python base? And why switch to Ubuntu?

I found that the current Limbo requirements pull in a bunch of native code -- and thus package dependencies. For example, when I try to build a Docker image given the simple Dockerfile provided above, I get the following error:

    Package 'libffi', required by 'virtual:world', not found

        No working compiler found, or bogus compiler options
        passed to the compiler from Python's distutils module.
        See the error messages above.
        (If they are about -mno-fused-madd and you are on OS/X 10.8,
        see http://stackoverflow.com/questions/22313407/ .)

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-zdidsh8x/cffi/

When I first got started, I tried for a while to find all the right Alpine packages to get Limbo to build, but was having trouble. I decided to switch to Ubuntu -- which I know better -- and got it going pretty quick. As you can see, I needed to pull in a fair number of dependencies to get things to work.

Does someone else have a better set of base dependencies (base image plus native packages) for Limbo than what I found?

Is there any advantage to the larger dockerfile?

There's two parts of the Dockerfile, the base dependencies which in my file is everything above ADD requirements.txt /app/, and then there's the app-specific scripts which start after that. I addressed the base dependencies above, but you might also be wondering why I didn't stick with:

ADD . /app
WORKDIR /app
RUN pip install -r requirements.txt
RUN python setup.py install

I want to let developers use Docker for their inner "edit-compile-test" loop. This means the Docker building process needs to be fast. The problem with ADD . /app is that it blows the Docker cache every time you make a change to the Limbo source code (and support files like Makefile or Dockerfile), which causes Docker to re-run pip install -r requirements.txt, which takes a lot of time. By separating the ADD command needed for this requirements-installation from the ADD of other files, we don't blow the cache and the build runs faster.

I could've done something like:

ADD requirements.txt /app/
WORKDIR /app
RUN pip install -r requirements.txt
ADD .
RUN python setup.py install

In fact, this what the file looked like for a while as I was developing. But this also blew the cache too many times for changes that didn't matter to the Limbo build (e.g., changes to Dockerfile), so I decided to be more surgical here as well.

eSoares commented 6 years ago

I want to let developers use Docker for their inner "edit-compile-test" loop.

Being possible to test in the production docker container can be an acceptable thing. But I think that I should not be a requirement. Test and development requirements could be different from production requirements. They should be separated images instead of adding development requirements to the production.

...which causes Docker to re-run pip install -r requirements.txt...

While people change code, they can change requirements also. Re-running this should always be mandatory if there are changes. Also in the development version, I think requirements-to-freeze.txt should be used, instead of requirements.txt

I don't have to type -e arguments over and over...

There is an ongoing pull request to support just using a config file with all the variables (see #146 ).

rstata commented 6 years ago

Test and development requirements could be different from production requirements. They should be separated images instead of adding development requirements to the production.

In Limbo, the contents of requirements.txt (derived from requirements-to-freeze.txt) determines the requirements of both development and production, i.e., those requirements haven't been separated. As a result, it doesn't make sense, today, to have separate images (multi-stage builds are an alternative to multiple images Docker to separate dev and prod requirements).

Limbo is so small, I don't think it makes sense to separate it's dev and prod requirements, but if that's the direction we want to go, that effort should be a separate task. (There should probably be a Makefile target to update requirements.txt from changes in requirements-to-freeze.txt, but that's also a separate task.)

While people change code, they can change requirements also. Re-running [pip install -r requirements] should always be mandatory if there are changes.

The Docker file I submitted takes care of this automatically: if requirements.txt changes, than Docker will re-run pip install -r requirements to ensure new images are up to date. When requirements.txt is unchanged, Docker uses the cached result, which is safe to do and speeds up the build.

rstata commented 6 years ago

Closed in favor of #154

llimllib / limbo

Update use of Docker #153