Mierdin commented 4 years ago

Earlier this year, I spent a lot of time creating a preview system so that contributors can preview their content as part of a pull request, rather than having to use selfmedicate. This improved the contribution process greatly. However, there's still one major problem with the contribution process, one that's plagued contributors for a long time, and one that I think, if we solve, will put the contributor experience where I've wanted it to be since the beginning of the project.

First, some background.

Endpoint images are not automatically built or used until they've been merged to master in the NRE Labs repo. Once there, a nightly process on our build server will build each image with a Makefile, and push to the antidotelabs organization. Of course, for every release of the curriculum, a tagged version is also uploaded and left as-is.

When a contributor opens a Pull Request to the NRE Labs curriculum for some new content, some basic CI checks take place, like spelling and validation with the antidote tool. If these check out, a request to our preview service is sent, which builds a preview instance of the NRE labs site with that branch checked out, and a link to that preview is left on the PR in the form of a bot comment.

With only content changes, this works really well. However, if the contributor is adding or changing any endpoint images, it all falls apart. The reason for this is that because their new/changed image is not already in master, these changes are not pushed to the docker hub org, and their changes are not reflected, at best. At worst, their lesson won't even start because the image doesn't exist yet (in the case of adding new images). Even if they were to add the necessary files in the PR, when it came time to launch their new lesson in the preview instance, Antidote would rightly choke when trying to launch a Pod with the specified image ref, as it wouldn't exist at that point.

It is for this reason that - in several places in the documentation - mandate that any image changes must be done first, ahead of any content contribution. This is obviously not ideal, but a requirement given the current way things are done. The preview service simply doesn't provide its intended value unless the images for a given chunk of content are sorted ahead of time.

I'd like to consider changing the process so that image changes and content changes can be addressed in a single PR.

Considerations

We should consider solving this problem so that image changes in a PR are available for the resulting preview instance to use. There are several considerations that should be taken into account, and I am really brainstorming here. I reserve the right to change the below at any time as I solidify my thoughts here.

Security

Should there be a check or approval step before autobuilding takes place, or should we just automatically build for all PRs just like previews today?

Efficiency

You will need to probably add an image build step prior to the preview step. You will also want to consider how to know what images to build (likely not efficient to build all images for each PR commit, then again how will you know what images are needed by a given PR?)

We should probably look first at any lesson definitions that are being added or changed. From that, we can derive the reference to images. We can use that, combined with any new or changed images, to determine which images to build. We will only build that subset. In addition, all images that are built in this final list should still be tagged with some kind of unique ID, likely bound to the Pull Request ID. This will ensure that people don't step on each others' toes.

Structure

Likely a separate organization will be needed, to ensure we don't accidentally overwrite an official image.

smk4664 commented 4 years ago

For Security, I highly recommend a linter for docker images. https://github.com/hadolint/hadolint. I would be interested in helping get this integrated and setup for security best practices.

Mierdin commented 4 years ago

@smk4664 Honestly that can happen anytime, and probably putting this into place before I add a bunch of automation on top would be best. I'd still like to hear more about the kind of things you see us being able to lint with that tool, but that can happen in a PR.

nre-learning / proposals

Solving the Endpoint Image Build Problem #14

Considerations

Security

Efficiency

Structure