Systematization of CD in our repos

consideRatio commented 5 years ago

I believe we have a lot to gain by making it easier and quicker to release with the help of Continuously Deployment (CD). If we make the complexity of making a release become git tag -a x.y.z && git push --tags then I'd be very happy :D

It is extra relevant if we end up with a chain of things that needs to be released for something to be fixed. From my own experience, it can become too scary to do it if it isn't quite simple.

Technical types of deployments

Publishing a pip package

To CD a pip package, do...

Publishing a Helm chart

We publish Helm charts in zero-to-jupyterhub-k8s and binderhub. This is done like this ...

???

Release guidelines

Should we have common release guidelines among repositories? If it is easy to make a release, we could encourage a practice of making beta releases for example.

I guess what I'm suggesting in this issue mainly is to support making releases using git only.

betatim commented 5 years ago

For repo2docker we have been using the "create a tag and travis does the rest" workflow and it is great!

The config for it is here https://github.com/jupyter/repo2docker/blob/11fc69f8d62a39f494109d2926cb763c022821c6/.travis.yml#L77-L86 and it works well. It si the same recipe I used for scikit-optimize previously. (side comment: pypi.org has been doing cool stuff wit h2FA and tokens and so on, basically better auth mechanisms, so maybe the recipe needs revamping soon/you can tell this is a snippet that I created many years ago and since have been copy&pasting).

It is really useful to have something like versioneer that figures out the version from the tag. A lot of people have a love-hate relationship with versioneer but I would recommend to "just use it" to get started. Then at a later point consider switching.

For BinderHub we don't have this, I'd say that is Ok as it is mostly a "helm package". It feels like Chartpress does everything we need there.

Release cadence: for BinderHub we release every time we merge something into master. I really like the continuous release practice and it seems to fit well with the ideas of CI, CD, small releases, continuous improvement from devops.

For repo2docker we try and cut a release every 3months. It is easy to do, keeps people who like releases happy, creates "fix points". Doing it regularly without "waiting for X" means it is easy to decide to not wait for X because it will be in the next release which is only a few months away.

choldgraf commented 5 years ago

The one thing that I think we miss with this "make a release for every commit" approach is that it makes it harder for us to communicate to the outside world what is being updated. Perhaps there is a way to make "checkpoints" that let us create a changelog but without making any kind of special commitment to stability, backwards-support, etc.

btw, I wrote a little CLI to quickly generate markdown that gives summaries / links of github activity in a repository. Perhaps it would be useful for us to use this in generating changelogs quickly? https://github.com/choldgraf/github-activity

For example, here was the output of running github-activity jupyterhub/team-compass 2019-09-01 (in a details tag so I don't clutter this issue)

# 2019-09-01...today ([full changelog](https://github.com/jupyterhub/team-compass/compare/e1478c93cfce5d7cd510bb1636250d4bff785dd5...d85e0464c481a5ad695006c9df6d82f04855c6ea)) ## Other closed PRs * September Meeting Notes [#212](https://github.com/jupyterhub/team-compass/pull/212) ([@Zsailer](https://github.com/Zsailer)) * fixing google groups link [#210](https://github.com/jupyterhub/team-compass/pull/210) ([@choldgraf](https://github.com/choldgraf)) * [WIP] adding team check-in process [#194](https://github.com/jupyterhub/team-compass/pull/194) ([@choldgraf](https://github.com/choldgraf)) ## Closed issues * September Team Meeting 2019 [#199](https://github.com/jupyterhub/team-compass/issues/199) ([@Zsailer](https://github.com/Zsailer)) * Moore foundation grant final report [#183](https://github.com/jupyterhub/team-compass/issues/183) ([@choldgraf](https://github.com/choldgraf)) * Some thoughts on the SRE guidelines: Upgrading k8s [#180](https://github.com/jupyterhub/team-compass/issues/180) ([@sgibson91](https://github.com/sgibson91)) * Discuss applying for a CZI "Essential OS for Science" grant [#156](https://github.com/jupyterhub/team-compass/issues/156) ([@choldgraf](https://github.com/choldgraf)) ## Opened PRs * September Meeting Notes [#212](https://github.com/jupyterhub/team-compass/pull/212) ([@Zsailer](https://github.com/Zsailer)) * fixing google groups link [#210](https://github.com/jupyterhub/team-compass/pull/210) ([@choldgraf](https://github.com/choldgraf)) ## Opened issues * A round of GKE credits for mybinder.org (2019-2020) [#214](https://github.com/jupyterhub/team-compass/issues/214) ([@choldgraf](https://github.com/choldgraf)) * Systematization of CD in our repos [#213](https://github.com/jupyterhub/team-compass/issues/213) ([@consideRatio](https://github.com/consideRatio)) * October Team Meeting 2019 [#211](https://github.com/jupyterhub/team-compass/issues/211) ([@Zsailer](https://github.com/Zsailer)) * @sgibson91 unavailability [#209](https://github.com/jupyterhub/team-compass/issues/209) ([@sgibson91](https://github.com/sgibson91)) * Software Sustainability Institute 2020 Fellowship application [#208](https://github.com/jupyterhub/team-compass/issues/208) ([@sgibson91](https://github.com/sgibson91)) * Adding @manics to the JupyterHub team [#207](https://github.com/jupyterhub/team-compass/issues/207) ([@consideRatio](https://github.com/consideRatio)) * JupyterHub / BinderHub team responsibilities [#206](https://github.com/jupyterhub/team-compass/issues/206) ([@consideRatio](https://github.com/consideRatio)) * mybinder.org subdomain: turing.mybinder.org [#205](https://github.com/jupyterhub/team-compass/issues/205) ([@sgibson91](https://github.com/sgibson91)) * Adding Gesis to the federation [#204](https://github.com/jupyterhub/team-compass/issues/204) ([@betatim](https://github.com/betatim)) * mybinder.org subdomain: gesis.mybinder.org [#203](https://github.com/jupyterhub/team-compass/issues/203) ([@betatim](https://github.com/betatim))

manics commented 5 years ago

For updating version strings I've used bump2version on a couple projects. You run it with the level of the release "major", "minor" or "patch", it figures out the version, updates any required files, and creates the tag. E.g. bumpversion minor resulted in https://github.com/ome/omero-signup/commit/7fe8d6b4a0f01aa9b3e1f9256adf1f0f9f3e4c15 No need to lookup the current version string, deal with numbers, or worry about making a typo in the tag.

betatim commented 5 years ago

I think a good way to combine "continuous releases" with "keep the world updated" is to have a blog post every N months that tells people what happened in the last month. Can call it "the quarterly X checkpoint" as well as blog posts when there is a big new feature. I see more and more companies moving towards decoupling technical releases from publicity pushes to reduce stress and coordination required. (If you pay close attention you will see new features appear and only a while later will there be an announcement about it. However most people don't pay close attention and for sure the "general public" doesn't so essentially no one notices.)

We already did this when we created the federation. We had it running for a few weeks to debug stuff and then announced it when we knew it was solid.

The markdown generator looks very cool for helping to create release notes!!

choldgraf commented 5 years ago

@betatim I like the idea of regular blog posts!

manics commented 5 years ago

I've just discovered https://github.com/toolmantim/release-drafter (via https://github.com/ansible/molecule/pull/2367) for automatically drafting release notes. Seems to operate using GitHub Actions.

willingc commented 5 years ago

@betatim I would prefer to pass on Versioneer. You can tell where I stand on the love-not_love debate :wink: It's a lot of overhead to a repo for minimal benefit IMHO.

manics commented 5 years ago

There are a couple of requests for new releases:

oauthenticator should be straightforward to automate using a Travis Pypi deployment, shall we aim to get that in place for the next release? Two questions:

Which pypi user account are the secrets associated with
Is it better to include encrypted secrets in the .travis.yml file (https://github.com/jupyterhub/team-compass/issues/213#issuecomment-537340789) or set a secret environment variable in the Travis repo settings e.g.: https://github.com/ome/scc/blob/v0.12.6/.travis.yml#L30-L36 (I find this easier than figuring out the travis command to create the encrypted string)

jupyter-server-proxy is a bit more complicated since it requires a pypi and npm release, though both are supported by Travis

betatim commented 5 years ago

I have a slight preference for storing the secrets in .travis.yml because that way we "for free" get a log of when and who changes them.

https://pypi.org/project/oauthenticator/ suggests that only @minrk has credentials for updating it. Maybe we can create a "jupyterhubteam" account that is a maintainer like we have a "mybinderteam" for https://pypi.org/project/binderhub/?

For jupyter-server-proxy I have no idea how to even get started on making a release :(

manics commented 5 years ago

A dedicated service account makes sense. Pypi recently added support for token authentication which can be scoped to a single package: https://pypi.org/help/#apitoken I've tested it on: https://github.com/manics/jupyter-notebookparams/blob/d0f663c9374bf9a7421912c86371e91e3c5c4572/.travis.yml#L10-L16

I'm happy to look into jupyter-server-proxy when everyone's agreed on this issue

betatim commented 5 years ago

The API token approach looks nice! I hadn't realised this had shipped yet (still dazzled by the 2FA support).

We could also add the mybinderteam account to the jupyter-server-proxy package as it is used a lot in Binder. I think I should have access to that account and can share a API token scoped to that package with you.

minrk commented 5 years ago

Let's make a jupyterhub-team bot account with token access for upload on PyPI. That sounds excellent.

manics commented 5 years ago

New Travis feature: importable configs: https://blog.travis-ci.com/2019-11-11-build-config-imports

Encrypted secrets, such as secure environment variables, can be shared across repositories owned by the same organization or user account (see below for restrictions).

If this works it means the token only has to be stored and updated in one place for all jupyterhub repos. Might be a reason to use a PyPi token with full access instead of a single repo?

consideRatio commented 4 years ago

I want to get actionable!

A dedicated service account makes sense. Pypi recently added support for token authentication which can be scoped to a single package: https://pypi.org/help/#apitoken

When using these tokens, we don't need a service account identity alongside them, only the token. The username will be set to __token__.

Let's make a jupyterhub-team bot account with token access for upload on PyPI. That sounds excellent.

Is it correct that the idea is to have a PyPI service account? If so, then I assume we let TravisCI we use this account's credentials to do deployments. I think there are some choices to be made.

Choice one - scope of PyPI credentials

We create one PyPI service account that has access to multiple projects.
We create project individual tokens that acts as passwords to dummy user __token__ on PyPI.

Choice two - scope of configuration

We create a central configuration using TravisCI build imports that would depending on the pervious choice either:
1. contain the deploy configuration and the encrypted credentials for the PyPI service account with access to multiple repositories.
2. contain the deploy configuration but require the encrypted credentials for PyPI to be passed as a project-by-project configured PYPI_PASSWORD environment variable.
We create project-by-project configuration for PyPI CD, and use either a central PyPI service account or a PyPI project deployment token.

My suggestion

We aim for option 2 in Choice one, in other words, to always use PyPI project scoped deployment tokens that doesn't require any PyPI identity and would be created by a PyPI project maintainer.
We try to utilize a central PyPI deployment configuration to configure PyPI deployment.
We use jupyterhub/team-compass as the source of a central .travis-deploy-pypi.yml

What do you all think?

consideRatio commented 4 years ago

@minrk can you give me access to kubespawner on pypi? I've have 2FA on PyPI as well as GitHub btw.

manics commented 4 years ago

I think a separate repo for shared Travis and other CI/CD configuration is clearer to others than reusing team-compass.

If there's an organisation wide pypi token stored as a secret in a shared Travis config (which also provides an audit log of changes) then adding CD to a repo should be two steps:

Reference the shared Travis config
Add the service user to the pypi project

If each project has its own token there are two additional steps:

Create the token
Add the token to the configuration

Both will work, though my preference is for the first as it's less manual work that needs to be done by an admin.

betatim commented 4 years ago

Which repos do we want to add this to?

I like the idea of having a central place to control the token, i am a bit hesitant towards creating a central recipe that has to work for "all the repos" and is based on a feature that is new (and only works for travis, not also circleci).

Besides credentials copy&pasting build config is a bit tedious when you do it but over the life of a repo (many years) I have hardly had the need to adjust the "push to pypi" part of CI setups. This means copy&paste is Ok and has the advantage of letting each repo slightly change stuff that they need to change vs having to find a way to configure the central recipe.

There are also repos (like repo2docker and JupyterHub) that have a working CI/CD setup that I think we should keep as-is (e.g. r2d has tried to move to Azure builds but we haven't even had the resources to complete that move, so changing other stuff "because we can"). This means we need a way to run several setups in parallel for a while. Finding a way that reduces the effort for new repos and works well together with existing ones is what I think we should aim for.

manics commented 4 years ago

I think having a central configuration for only the deploy section is good enough, it doesn't need to be for the whole CI config. My thinking is to reduce the number of infrequent manual admin steps that have to be done i.e. dealing with credentials.

minrk commented 4 years ago

Since tokens can be scoped to packages, I think it's a good idea to have tokens allocated per package, scoped only to that package, not re-using tokens across repos.

I think the degree to which we have uniformity should probably be at the level of a "suggested template" that we can host here in team-compass. I don't think inheritance is worth the challenges involved in actually making something work everywhere, but documentation for "it's a good idea to start here" is probably the most useful level of sharing.

consideRatio commented 4 years ago

In my mind then, the action plan is:

We setup CD of PyPI package on one repo, like @manics have done already in https://github.com/jupyterhub/oauthenticator/pull/301.
We try this setup by pushing a git tag and iterate to get it working if needed.
We document how we did it within team-compass
We spread the practice to other repositories over time.

Then, we repeat for:

npm packages on npmjs
Dockerfiles to be published as images on dockerhub
Helm charts to be published on jupyterhub/helm-chart
...?

consideRatio commented 4 years ago

Action point

In the Team compass resources section, we add another subsection about repository building blocks where we describe various common patterns, such as for example automating PyPI package uploads on git tag pushes using TravisCI.

consideRatio commented 4 years ago

To use a lockfile or similar could be useful with regards to releases, to pinpoint the state. I consider both using a lockfile in repo, or producing a build artificat that we store in association with the release.

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1221

manics commented 4 years ago

Does anyone have experience with https://pypi.org/project/setuptools-scm/ ?

I've just tried it. Instead of putting the version in setup.py or another file setuptools_scm calculates it from the git version (automatically adding a .dev... suffix for non-tagged commits) and writes out the version to a file such as version.py. Seems quite neat and avoids explicitly installing another tool, but I'm wondering if there are any disadvantages?

manics commented 4 years ago

One disadvantage is you can't install an archive directly from GitHub

$ pip install https://github.com/manics/jupyter-pyfilesystem/archive/setuptoolsscm.zip
...
    ERROR: Command errored out with exit status 1:
...
    Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.

    For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj

As indicated by the error message you need to use a git+ URL instead:

$ pip install git+https://github.com/manics/jupyter-pyfilesystem.git@setuptoolsscm
...
$ pip list|grep jupyter-pyfilesystem
jupyter-pyfilesystem        0.0.6.dev1+g1406aee

but having an automatic version that includes the commit hash is nice.

Example of the changes to use setuptools_scm https://github.com/manics/jupyter-pyfilesystem/commit/1406aeebe68eb435e92f39e5aee01cc319bd5cb3

willingc commented 4 years ago

@manics I have not used it. I'm not against using it as it under PyPA so will have some maintainership of the code. One area to investigate before adopting would be its usage with Sphinx and any extensions to see if additional configuration is needed to make it work there and on RTD.

@takluyver Happy New Year. Have you used setuptools_scm or have any info about it?

minrk commented 4 years ago

Yeah, setuptools-scm and versioneer and the like are nice for automation, but do tend to eliminate support for installing from git archives (which necessarily lack both the scm data and generated version.py) and can cause issues with installing from forks, which often don't have up-to-date tags.

betatim commented 4 years ago

Installing straight from GitHub via pip https://github.com/foo/bar/archive/master.zip works with versioneer. Haven't tried forks.

minrk commented 4 years ago

Installing straight from GitHub via pip https://github.com/foo/bar/archive/master.zip works with versioneer. Haven't tried forks.

It 'works' to install from a git archive in that the code is installed, but doesn't install with the right version number. So the package itself works, but versioneer fails to do its thing. This is fine if nothing is going to check the version of the package, but can cause surprising issues if you install another package that depends on a specific version of the first, either with runtime version checks or an install-time dependency on 'my-package >= 1.0' which won't be satisfied because the version reported is just 0:

$ pip install https://github.com/jupyterhub/traefik-proxy/archive/master.zip
$ pip list | grep traefik
jupyterhub-traefik-proxy 0+unknown

Installing from forks with git has a similar issue in that if the tags are out of date, installing from a git url (pip install git+https://github.com/jupyterhub/traefik-proxy) will also 'work' but will report the most recent tag, which is typically the last before the creation of the fork, with how most folks seem to use git. So 2.0 can be reported as e.g. 1.0+999, depending on how old the fork is. For the above package, to use a real-world example installing the same commit from two forks gives two different version numbers:

$ pip install git+https://github.com/jupyterhub/traefik-proxy
...
Successfully installed jupyterhub-traefik-proxy-0.1.4+4.ga96d0eb
$ pip install git+https://github.com/minrk/traefik-proxy
...
Successfully installed jupyterhub-traefik-proxy-0.1.2+64.ga96d0eb

To summarize, it's no problem if your installed-from-a-branch package has no dependencies, but can be for dependencies of the package.

minrk commented 4 years ago

Responding to questions that I missed along the way:

Should we have common release guidelines among repositories? If it is easy to make a release, we could encourage a practice of making beta releases for example.

I'm not sure this belongs in a central place, since it will vary by repo. Most of our repos are lightweight and small with frequent mostly bugfix/new feature releases, where betas, etc. don't add anything but time to the release process. A few, such as jupyterhub and zero-to-jupyterhub are large, production projects that can easily contain changes that break stuff in very particular unanticipated user configurations not covered by our test suite, where the release really benefits from testing by the early-adopter user community. These are the cases where betas and release candidates really help. For most of our Authenticators, Spawners, etc., it doesn't benefit either side to add this to the process.

If there is a guideline, I would probably say that it would be to ask this question: do we need to solicit feedback from the wider user community "testing in the wild" before we know if we are ready to make this release? And can we expect to get this testing and feedback during this time? If the answer is "yes," then a prerelease (and announcement to ask folks to test, etc.) is warranted. If the answer is "no," (i.e. that we are confident in the changes and/or we wouldn't get enough user testing and feedback on the changes to be worthwhile), then publishing the release immediately is the thing to do.

Factors that contribute to "yes, we need a beta"

lots of folks are affected
there are breaking changes
there are untested changes or use cases
there are users who will test a prerelease who do not test with master

Factors that contribute to "no, let's publish without a beta"

only contains bugfixes
changes are small
using master is common, so is already well tested "in the wild" (e.g. repo2docker)
it's hard or unlikely for folks to deploy the beta, so feedback is unlikely within the prerelease window

The result is that for most repos, the answer is usually no, but certain releases could use a beta round to solicit testing and feedback, especially for specific new features or changes. It's also the other way around with the big repos - we usually do a beta for jupyterhub and zero-to-jupyterhub, but sometimes there's a small bugfix we want to push out and there's no need for a beta round in those cases. In all cases, it's really up to the judgement of the team for each release of each repo and if there is a process, it's a per-repo per-release decision, not a project-wide one.

consideRatio commented 4 years ago

@minrk thanks for an excellent elaboration on when to have a beta and not etc, it makes a lot of sense to me!

betatim commented 4 years ago

I had not bumped into the situation of a fork and that resulting in the wrong version getting reported :-/ How often do people run into these problems?

I'd argue that most people install from PyPI, the next largest group installs from a tag/branch of the upstream repo, and then people who install from their fork. Does that seem like a reasonable sorting? I don't know how big each group is though :-/

manics commented 4 years ago

@betatim It'll occur if you:

Fork a repo
The parent repo master gets updated with new commits and a new tag
You pull master from the parent repo into your fork, but you don't pull all tags
You use versioneer /setuptools-scm in your fork. Those tools look at the last git tag to calculate the version, so if you haven't pulled the latest tags into your repo the tool doesn't know anything about it so will calculate the version based on whatever tag you've got.

It's annoying and may leak into other packaging tools (e.g. conda may need the tool added a dependency), and that's balanced against the convenience of having versions just work based on pushing a tag with no need to faff with running a separate command.

Would it help to have a demo session (e.g. zoom call) or some simple example repos showing how each tool works? One of the difficulties in evaluating versioneer in https://github.com/jupyterhub/traefik-proxy/pull/89 is many people were unfamiliar with it, and it's difficult to tell how much of the versioneer code is required and how much was additions/customisation built on top of versioneer but would be required for any other versioning tool.

Tools mentioned so far are:

bumpversion: Example https://github.com/jupyterhub/chartpress/pull/74/files
versioneer: Example in https://github.com/jupyterhub/traefik-proxy, does anyone have a simpler example?
setuptools-scm: I've tried it in https://github.com/manics/jupyter-pyfilesystem/pull/1

betatim commented 4 years ago

What I meant with "what happens"/"who does this happen to" was more a question in the direction of who experiences this problem while doing what? If we want to weight this up against the convenience of the package maintainers it matters if this frequently/rarely effects newcomers/seasoned devs or the odd person doing weird things.

Thinking back in my life history I can't remember ever being caught out by this. So either I don't ever have the use-case that would put me in the position of suffering from this or the use-case is exceedingly rare so I've forgotten already or it happens frequently but I never notice or yet another thing.

I do remember having to re-release packages because there was a typo in the version or one of the places in the repo not being updated, or it not being bumped properly back to "dev" or people not making releases because too many steps were involved :-/

As a result I have a strong bias towards versioneer because it makes the failure mode I've experience several times go away. Until today I didn't realise there was a downside (outdated tags for forks) :)

manics commented 4 years ago

Sorry, I get you now! I've run into both sides of the problem:

A repo that predated versioneer and other tools, and had it's own magic to get a version from git tags. Travis would constantly fail on my branch but passed when a PR was open because I was missing tags on my branch (the version was used to reference another dependent component).
Forgetting to bump the version in setup.py before tagging (I've even managed to forget when bump2version was configured).

willingc commented 4 years ago

I'm not a versioneer lover and far prefer the manual update _version.py to release and back to dev for projects that don't release often. I do agree with @minrk and @betatim that for those with frequent releases versioneer makes sense since automated is better than missing a manual step.

consideRatio commented 1 year ago

We have shifted towards using tbump and a RELEASE.md that outlines that we should update the changelog based on github-acitivyt, and besides that pretty much use tbump to do the rest.

Example of this can be seen in https://github.com/jupyterhub/jupyterhub/blob/3.1.1/RELEASE.md.

I'll close this issue as its quite outdated, and we have a jupyterhub/jupyterhub-python-repo-template where we can establish common practices etc.

jupyterhub / team-compass