pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.56k stars 17.56k forks source link

Release automation #21050

Open TomAugspurger opened 6 years ago

TomAugspurger commented 6 years ago

Tracking manual things here

jbrockmendel commented 6 years ago

How much of this goal is pandas-specific vs potentially more broadly useful? Here I'm mostly thinking of statsmodels piggybacking on effort put in here.

TomAugspurger commented 6 years ago

See https://github.com/pandas-dev/pandas-release

Some parts are pandas-specific, others could be adapted without too much effort.

There's also rever, which may be suitable for statsmodels.

jorisvandenbossche commented 3 years ago

@simonjayhawkins for the release date: I suppose that with git you can get the date of the release commit / tag, and then you can add that to the html context in conf.py and then inject that in the notes.

However, I think there is also some value in having it hardcoded in the docs I think, or at least for older releases, when looking at the source of whatsnew files to see the date (but maybe this is not really important, you can always see it online)

simonjayhawkins commented 3 years ago

Thanks @jorisvandenbossche. could be simpler to search and replace (the ??) after tagging in the release process. not really looked into this much yet.

simonjayhawkins commented 3 years ago

after tagging

actually would need to be before tagging as automated commit.

but that wouldn't be on master!

simonjayhawkins commented 3 years ago

I have started to experiment with GitHub Actions for the Release Process.

https://github.com/simonjayhawkins/pandas-release/actions?query=workflow%3A%22Manual+Release%22

In order to get workflows to trigger, appropriate permission are required.

To experiment, have forked https://github.com/pandas-dev/pandas-release

created a branch for the changes and then in the forked repo have made the new branch the default so actions can be triggered.

for now, using a manual trigger.

jreback commented 3 years ago

https://github.com/dask/dask-gateway/pull/339/files

looks interesting as an example of a GH here

cc @simonjayhawkins

simonjayhawkins commented 3 years ago

Thanks @jreback.

It may make sense to build the wheels for release in github actions from pandas as part of a single workflow, but we obviously don't want to duplicate code in the workflow.

If we wanted MacPython to still be the single source of truth, we could maybe create an action and re-use. (i've only created one action up to now see https://github.com/simonjayhawkins/pandas-release/commit/f8b497a40ccf38cf2048d3b87e4091e66f39aad8 so not sure if this would the correct approach)

TomAugspurger commented 3 years ago

I haven't checked, but we can maybe use the https://github.com/peter-evans/repository-dispatch action to have an action here (pushed a tag) to trigger an action in MacPython/pandas-wheels.

simonjayhawkins commented 3 years ago

Thanks @TomAugspurger

dispatching events between repositories sort of leads to another big question. Should we keep pandas-release as a separate repo or include the release scripts in pandas?

TomAugspurger commented 3 years ago

I'm fine with moving pandas-release here.

On Mon, Oct 19, 2020 at 7:01 AM Simon Hawkins notifications@github.com wrote:

Thanks @TomAugspurger https://github.com/TomAugspurger

dispatching events between repositories sort of leads to another big question. Should we keep pandas-release as a separate repo or include the release scripts in pandas?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/21050#issuecomment-712108819, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIW5ZAZS32SIIRPTIJTSLQTC3ANCNFSM4E75VQVQ .

jorisvandenbossche commented 3 years ago

There might be some value in having it as a separate repo? (just thinking out loud, not sure) For example, you can still make updates / fix something, even after a tag for a release is created, without having to deal with a "dirty" git state.

(eg if we want to have an action triggered on a release here in this repo, that action could also check out the separate repo)

simonjayhawkins commented 3 years ago

also thinking out loud... if the release is more automated, would there be less of an issue with deleting the tag and restarting to process? We would obviously want a point of no return, but would that be as late in the process as PyPI upload. I think it's possible to delete github releases? (Although IIUC only conda release needs the Github release)

so process could be re-ordered slightly

tag conda and pip tests build documentation upload documentation build wheels push tag (if easy to remove tag on workflow failure could do this as very first step?) create github release upload to PyPI start conda build process

simonjayhawkins commented 3 years ago

(eg if we want to have an action triggered on a release here in this repo, that action could also check out the separate repo)

I could make changes to #36704 to see how that would work

simonjayhawkins commented 3 years ago

without having to deal with a "dirty" git state.

the current process starts from HEAD. only the tag script needs to start from HEAD. once tagged the git state may not be an issue.

jorisvandenbossche commented 3 years ago

would there be less of an issue with deleting the tag and restarting to process?

Although it's certainly more minor (also given it's on the release branch), but that is still "rewriting" git history, and I think ideally we should try to avoid that generally speaking.

simonjayhawkins commented 3 years ago

At present we create a commit for the tag but this may not be necessary. Maybe we could tag the appropriate commit directly.

IIUC deleting a release from Github can be done easily from the UI but I think deleting a tag needs to be done from the cli.

so if we don't have a release commit, maybe deleting a tag is not bad practice. If we change the order of the release process, deleting a release should not be necessary. The GitHub release could be the point of no return.

The reason we may want to explore the possibility of changing tags is that the tag could be the release trigger and the process could bail if further checks fail. (At the moment, we tag locally, do the checks and then push the tag.)

simonjayhawkins commented 1 year ago

in https://github.com/pandas-dev/pandas-release/pull/33#issuecomment-1205726818 @lithomas1 wrote

Bigger picture, for build system/CI/release work, I was thinking that we prioritize:

Python 3.11 work a. I was busy for a while, so the Python 3.11rc(this Friday) date kind of crept on me. We'll need to merge the CI testing(https://github.com/pandas-dev/pandas/pull/47442) once its ready and add wheel builds for that. Migrating to a new build system(can be done in parallel with 3) a. This is going to be hard. Someone needs to write up a PDEP, and we need to pick between meson and Cmake. cibuildwheel work a. This'll be the third migration that I'm going to be doing(after numpy and cython), so it should be relatively easy. b. This'll also save a bunch of time in terms of the manual parts of the release workflow. There's going to be no more need to open the PR against pandas-wheels or check the build status after this is done. If you have anything else in your release workflow that you'd like to automate, I can also do it now while I'm still doing build work.

simonjayhawkins commented 1 year ago

If you have anything else in your release workflow that you'd like to automate, I can also do it now while I'm still doing build work.

Yep making the mechanics of the release process more visible/assessible to others is obviously what we as a team want.

For me, i've done a few releases now so the mechanics of doing a release is not really an issue. (In fact I would personally not prefer to change anything during the more busy periods of the release cycle)

So feel free to push those activities in the relevant issues.

datapythonista commented 1 year ago

If I'm not missing anything, I think what makes most sense is to move the next things from the pandas-release repo to this one:

For every PR, add to the CI:

Add to the release docs the process to create the tags manually, instead of using the Makefile and scripts in the pandas-release repo.

Add CI jobs for when a new tag is pushed:

Not sure if it's easy to automate the part to upload the wheels to PyPI, and to merge the conda-forge PR, probably we can leave this manually for now.

Maybe we can add a manual CI job to make the release announcement to all the different channels (mailing lists, Twitter, Telegram, Slack...).

I think with this the release process should be much simpler, faster and intuitive, and we can remove completely the pandas-release repo.