Closed consideRatio closed 1 year ago
For repo2docker we have been using the "create a tag and travis does the rest" workflow and it is great!
The config for it is here https://github.com/jupyter/repo2docker/blob/11fc69f8d62a39f494109d2926cb763c022821c6/.travis.yml#L77-L86 and it works well. It si the same recipe I used for scikit-optimize previously. (side comment: pypi.org has been doing cool stuff wit h2FA and tokens and so on, basically better auth mechanisms, so maybe the recipe needs revamping soon/you can tell this is a snippet that I created many years ago and since have been copy&pasting).
It is really useful to have something like versioneer that figures out the version from the tag. A lot of people have a love-hate relationship with versioneer but I would recommend to "just use it" to get started. Then at a later point consider switching.
For BinderHub we don't have this, I'd say that is Ok as it is mostly a "helm package". It feels like Chartpress does everything we need there.
Release cadence: for BinderHub we release every time we merge something into master. I really like the continuous release practice and it seems to fit well with the ideas of CI, CD, small releases, continuous improvement from devops.
For repo2docker we try and cut a release every 3months. It is easy to do, keeps people who like releases happy, creates "fix points". Doing it regularly without "waiting for X" means it is easy to decide to not wait for X because it will be in the next release which is only a few months away.
The one thing that I think we miss with this "make a release for every commit" approach is that it makes it harder for us to communicate to the outside world what is being updated. Perhaps there is a way to make "checkpoints" that let us create a changelog but without making any kind of special commitment to stability, backwards-support, etc.
btw, I wrote a little CLI to quickly generate markdown that gives summaries / links of github activity in a repository. Perhaps it would be useful for us to use this in generating changelogs quickly? https://github.com/choldgraf/github-activity
For example, here was the output of running github-activity jupyterhub/team-compass 2019-09-01
(in a details tag so I don't clutter this issue)
For updating version strings I've used bump2version on a couple projects. You run it with the level of the release "major"
, "minor"
or "patch"
, it figures out the version, updates any required files, and creates the tag. E.g. bumpversion minor
resulted in https://github.com/ome/omero-signup/commit/7fe8d6b4a0f01aa9b3e1f9256adf1f0f9f3e4c15
No need to lookup the current version string, deal with numbers, or worry about making a typo in the tag.
I think a good way to combine "continuous releases" with "keep the world updated" is to have a blog post every N months that tells people what happened in the last month. Can call it "the quarterly X checkpoint" as well as blog posts when there is a big new feature. I see more and more companies moving towards decoupling technical releases from publicity pushes to reduce stress and coordination required. (If you pay close attention you will see new features appear and only a while later will there be an announcement about it. However most people don't pay close attention and for sure the "general public" doesn't so essentially no one notices.)
We already did this when we created the federation. We had it running for a few weeks to debug stuff and then announced it when we knew it was solid.
The markdown generator looks very cool for helping to create release notes!!
@betatim I like the idea of regular blog posts!
I've just discovered https://github.com/toolmantim/release-drafter (via https://github.com/ansible/molecule/pull/2367) for automatically drafting release notes. Seems to operate using GitHub Actions.
@betatim I would prefer to pass on Versioneer. You can tell where I stand on the love-not_love debate :wink: It's a lot of overhead to a repo for minimal benefit IMHO.
There are a couple of requests for new releases:
oauthenticator should be straightforward to automate using a Travis Pypi deployment, shall we aim to get that in place for the next release? Two questions:
.travis.yml
file (https://github.com/jupyterhub/team-compass/issues/213#issuecomment-537340789) or set a secret environment variable in the Travis repo settings e.g.: https://github.com/ome/scc/blob/v0.12.6/.travis.yml#L30-L36 (I find this easier than figuring out the travis command to create the encrypted string)jupyter-server-proxy is a bit more complicated since it requires a pypi and npm release, though both are supported by Travis
I have a slight preference for storing the secrets in .travis.yml
because that way we "for free" get a log of when and who changes them.
https://pypi.org/project/oauthenticator/ suggests that only @minrk has credentials for updating it. Maybe we can create a "jupyterhubteam" account that is a maintainer like we have a "mybinderteam" for https://pypi.org/project/binderhub/?
For jupyter-server-proxy I have no idea how to even get started on making a release :(
A dedicated service account makes sense. Pypi recently added support for token authentication which can be scoped to a single package: https://pypi.org/help/#apitoken I've tested it on: https://github.com/manics/jupyter-notebookparams/blob/d0f663c9374bf9a7421912c86371e91e3c5c4572/.travis.yml#L10-L16
I'm happy to look into jupyter-server-proxy when everyone's agreed on this issue
The API token approach looks nice! I hadn't realised this had shipped yet (still dazzled by the 2FA support).
We could also add the mybinderteam account to the jupyter-server-proxy package as it is used a lot in Binder. I think I should have access to that account and can share a API token scoped to that package with you.
Let's make a jupyterhub-team bot account with token access for upload on PyPI. That sounds excellent.
New Travis feature: importable configs: https://blog.travis-ci.com/2019-11-11-build-config-imports
Encrypted secrets, such as secure environment variables, can be shared across repositories owned by the same organization or user account (see below for restrictions).
If this works it means the token only has to be stored and updated in one place for all jupyterhub repos. Might be a reason to use a PyPi token with full access instead of a single repo?
I want to get actionable!
A dedicated service account makes sense. Pypi recently added support for token authentication which can be scoped to a single package: https://pypi.org/help/#apitoken
When using these tokens, we don't need a service account identity alongside them, only the token. The username will be set to __token__
.
Let's make a jupyterhub-team bot account with token access for upload on PyPI. That sounds excellent.
Is it correct that the idea is to have a PyPI service account? If so, then I assume we let TravisCI we use this account's credentials to do deployments. I think there are some choices to be made.
__token__
on PyPI..travis-deploy-pypi.yml
What do you all think?
@minrk can you give me access to kubespawner on pypi? I've have 2FA on PyPI as well as GitHub btw.
I think a separate repo for shared Travis and other CI/CD configuration is clearer to others than reusing team-compass.
If there's an organisation wide pypi token stored as a secret in a shared Travis config (which also provides an audit log of changes) then adding CD to a repo should be two steps:
If each project has its own token there are two additional steps:
Both will work, though my preference is for the first as it's less manual work that needs to be done by an admin.
Which repos do we want to add this to?
I like the idea of having a central place to control the token, i am a bit hesitant towards creating a central recipe that has to work for "all the repos" and is based on a feature that is new (and only works for travis, not also circleci).
Besides credentials copy&pasting build config is a bit tedious when you do it but over the life of a repo (many years) I have hardly had the need to adjust the "push to pypi" part of CI setups. This means copy&paste is Ok and has the advantage of letting each repo slightly change stuff that they need to change vs having to find a way to configure the central recipe.
There are also repos (like repo2docker and JupyterHub) that have a working CI/CD setup that I think we should keep as-is (e.g. r2d has tried to move to Azure builds but we haven't even had the resources to complete that move, so changing other stuff "because we can"). This means we need a way to run several setups in parallel for a while. Finding a way that reduces the effort for new repos and works well together with existing ones is what I think we should aim for.
I think having a central configuration for only the deploy section is good enough, it doesn't need to be for the whole CI config. My thinking is to reduce the number of infrequent manual admin steps that have to be done i.e. dealing with credentials.
Since tokens can be scoped to packages, I think it's a good idea to have tokens allocated per package, scoped only to that package, not re-using tokens across repos.
I think the degree to which we have uniformity should probably be at the level of a "suggested template" that we can host here in team-compass. I don't think inheritance is worth the challenges involved in actually making something work everywhere, but documentation for "it's a good idea to start here" is probably the most useful level of sharing.
In my mind then, the action plan is:
Then, we repeat for:
In the Team compass resources section, we add another subsection about repository building blocks where we describe various common patterns, such as for example automating PyPI package uploads on git tag pushes using TravisCI.
To use a lockfile or similar could be useful with regards to releases, to pinpoint the state. I consider both using a lockfile in repo, or producing a build artificat that we store in association with the release.
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1221
Does anyone have experience with https://pypi.org/project/setuptools-scm/ ?
I've just tried it. Instead of putting the version in setup.py or another file setuptools_scm
calculates it from the git version (automatically adding a .dev...
suffix for non-tagged commits) and writes out the version to a file such as version.py
. Seems quite neat and avoids explicitly installing another tool, but I'm wondering if there are any disadvantages?
One disadvantage is you can't install an archive directly from GitHub
$ pip install https://github.com/manics/jupyter-pyfilesystem/archive/setuptoolsscm.zip
...
ERROR: Command errored out with exit status 1:
...
Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.
For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
As indicated by the error message you need to use a git+
URL instead:
$ pip install git+https://github.com/manics/jupyter-pyfilesystem.git@setuptoolsscm
...
$ pip list|grep jupyter-pyfilesystem
jupyter-pyfilesystem 0.0.6.dev1+g1406aee
but having an automatic version that includes the commit hash is nice.
Example of the changes to use setuptools_scm
https://github.com/manics/jupyter-pyfilesystem/commit/1406aeebe68eb435e92f39e5aee01cc319bd5cb3
@manics I have not used it. I'm not against using it as it under PyPA so will have some maintainership of the code. One area to investigate before adopting would be its usage with Sphinx and any extensions to see if additional configuration is needed to make it work there and on RTD.
@takluyver Happy New Year. Have you used setuptools_scm
or have any info about it?
Yeah, setuptools-scm and versioneer and the like are nice for automation, but do tend to eliminate support for installing from git archives (which necessarily lack both the scm data and generated version.py) and can cause issues with installing from forks, which often don't have up-to-date tags.
Installing straight from GitHub via pip https://github.com/foo/bar/archive/master.zip
works with versioneer. Haven't tried forks.
Installing straight from GitHub via pip https://github.com/foo/bar/archive/master.zip works with versioneer. Haven't tried forks.
It 'works' to install from a git archive in that the code is installed, but doesn't install with the right version number. So the package itself works, but versioneer fails to do its thing. This is fine if nothing is going to check the version of the package, but can cause surprising issues if you install another package that depends on a specific version of the first, either with runtime version checks or an install-time dependency on 'my-package >= 1.0'
which won't be satisfied because the version reported is just 0
:
$ pip install https://github.com/jupyterhub/traefik-proxy/archive/master.zip
$ pip list | grep traefik
jupyterhub-traefik-proxy 0+unknown
Installing from forks with git has a similar issue in that if the tags are out of date, installing from a git url (pip install git+https://github.com/jupyterhub/traefik-proxy
) will also 'work' but will report the most recent tag, which is typically the last before the creation of the fork, with how most folks seem to use git. So 2.0
can be reported as e.g. 1.0+999
, depending on how old the fork is. For the above package, to use a real-world example installing the same commit from two forks gives two different version numbers:
$ pip install git+https://github.com/jupyterhub/traefik-proxy
...
Successfully installed jupyterhub-traefik-proxy-0.1.4+4.ga96d0eb
$ pip install git+https://github.com/minrk/traefik-proxy
...
Successfully installed jupyterhub-traefik-proxy-0.1.2+64.ga96d0eb
To summarize, it's no problem if your installed-from-a-branch package has no dependencies, but can be for dependencies of the package.
Responding to questions that I missed along the way:
Should we have common release guidelines among repositories? If it is easy to make a release, we could encourage a practice of making beta releases for example.
I'm not sure this belongs in a central place, since it will vary by repo. Most of our repos are lightweight and small with frequent mostly bugfix/new feature releases, where betas, etc. don't add anything but time to the release process. A few, such as jupyterhub and zero-to-jupyterhub are large, production projects that can easily contain changes that break stuff in very particular unanticipated user configurations not covered by our test suite, where the release really benefits from testing by the early-adopter user community. These are the cases where betas and release candidates really help. For most of our Authenticators, Spawners, etc., it doesn't benefit either side to add this to the process.
If there is a guideline, I would probably say that it would be to ask this question: do we need to solicit feedback from the wider user community "testing in the wild" before we know if we are ready to make this release? And can we expect to get this testing and feedback during this time? If the answer is "yes," then a prerelease (and announcement to ask folks to test, etc.) is warranted. If the answer is "no," (i.e. that we are confident in the changes and/or we wouldn't get enough user testing and feedback on the changes to be worthwhile), then publishing the release immediately is the thing to do.
Factors that contribute to "yes, we need a beta"
Factors that contribute to "no, let's publish without a beta"
The result is that for most repos, the answer is usually no, but certain releases could use a beta round to solicit testing and feedback, especially for specific new features or changes. It's also the other way around with the big repos - we usually do a beta for jupyterhub and zero-to-jupyterhub, but sometimes there's a small bugfix we want to push out and there's no need for a beta round in those cases. In all cases, it's really up to the judgement of the team for each release of each repo and if there is a process, it's a per-repo per-release decision, not a project-wide one.
@minrk thanks for an excellent elaboration on when to have a beta and not etc, it makes a lot of sense to me!
I had not bumped into the situation of a fork and that resulting in the wrong version getting reported :-/ How often do people run into these problems?
I'd argue that most people install from PyPI, the next largest group installs from a tag/branch of the upstream repo, and then people who install from their fork. Does that seem like a reasonable sorting? I don't know how big each group is though :-/
@betatim It'll occur if you:
master
gets updated with new commits and a new tagmaster
from the parent repo into your fork, but you don't pull all tagsIt's annoying and may leak into other packaging tools (e.g. conda may need the tool added a dependency), and that's balanced against the convenience of having versions just work based on pushing a tag with no need to faff with running a separate command.
Would it help to have a demo session (e.g. zoom call) or some simple example repos showing how each tool works? One of the difficulties in evaluating versioneer in https://github.com/jupyterhub/traefik-proxy/pull/89 is many people were unfamiliar with it, and it's difficult to tell how much of the versioneer code is required and how much was additions/customisation built on top of versioneer but would be required for any other versioning tool.
Tools mentioned so far are:
What I meant with "what happens"/"who does this happen to" was more a question in the direction of who experiences this problem while doing what? If we want to weight this up against the convenience of the package maintainers it matters if this frequently/rarely effects newcomers/seasoned devs or the odd person doing weird things.
Thinking back in my life history I can't remember ever being caught out by this. So either I don't ever have the use-case that would put me in the position of suffering from this or the use-case is exceedingly rare so I've forgotten already or it happens frequently but I never notice or yet another thing.
I do remember having to re-release packages because there was a typo in the version or one of the places in the repo not being updated, or it not being bumped properly back to "dev" or people not making releases because too many steps were involved :-/
As a result I have a strong bias towards versioneer because it makes the failure mode I've experience several times go away. Until today I didn't realise there was a downside (outdated tags for forks) :)
Sorry, I get you now! I've run into both sides of the problem:
I'm not a versioneer lover and far prefer the manual update _version.py
to release and back to dev for projects that don't release often. I do agree with @minrk and @betatim that for those with frequent releases versioneer makes sense since automated is better than missing a manual step.
We have shifted towards using tbump
and a RELEASE.md that outlines that we should update the changelog based on github-acitivyt, and besides that pretty much use tbump to do the rest.
Example of this can be seen in https://github.com/jupyterhub/jupyterhub/blob/3.1.1/RELEASE.md.
I'll close this issue as its quite outdated, and we have a jupyterhub/jupyterhub-python-repo-template where we can establish common practices etc.
I believe we have a lot to gain by making it easier and quicker to release with the help of Continuously Deployment (CD). If we make the complexity of making a release become
git tag -a x.y.z && git push --tags
then I'd be very happy :DIt is extra relevant if we end up with a chain of things that needs to be released for something to be fixed. From my own experience, it can become too scary to do it if it isn't quite simple.
Technical types of deployments
Publishing a pip package
To CD a pip package, do...
Publishing a Helm chart
We publish Helm charts in zero-to-jupyterhub-k8s and binderhub. This is done like this ...
???
Release guidelines
Should we have common release guidelines among repositories? If it is easy to make a release, we could encourage a practice of making beta releases for example.
I guess what I'm suggesting in this issue mainly is to support making releases using git only.