Release Management - Githubissues

sdudley-bsc commented 2 years ago

This issue is being submitted to propose a change to release management for pandas_market_calendars: Treat each update to the master branch as a release.

A GitHub Actions workflow (or maybe Travis CI) could be used to automate the following when master is updated:

Build source distribution and wheel files.
Create a release on GitHub.
- Use the version number for release tag and release title.
- Maybe: extract release notes from the change_log.rst file to use for release description.
- Maybe: attach source distribution and wheel files.
Upload source distribution and wheel files to PyPI.

It may also be possible to automate the update of documentation on Read the Docs.

Automating all release activity would provide the following benefits:

An update to the master branch would quickly be available on PyPI. There would be no need for a human to upload files.
A user of pandas_market_calendars could easily review source code on GitHub associated with a version on PyPI.

How contributing to the project would change: A pull request would need to include a new version number and associated release notes.

One important requirement: The automated steps above should not occur unless all tests in the pandas_market_calendars test suite have passed.

Currently, a Travis CI pull request build gets triggered when a pull request is created or updated. Such a build includes running the pandas_market_calendars test suite with multiple versions of Python.

To meet the requirement above, the .travis.yml file could be updated so that a build fails if any tests fail, and the reviewer of a pull request could verify that the associated Travis CI build succeeded. In addition to all that, a branch protection rule for the master branch could be added or updated to require that a Travis CI status check passes before a pull request can be merged. One potential issue: The master branch could be updated after a pull request is created.

Maybe that issue could be addressed by a pull request reviewer triggering another Travis CI pull request build.

Another potential requirement: The automated steps above should not occur if documentation cannot be built.

Feedback is welcome, especially on any potential issues I've missed.

Stryder-Git commented 2 years ago

I like this, let's see if others chime in, but I am happy to help out setting this up!

rsheftel commented 2 years ago

As the original owner of this project I strongly encourage this. Thanks to the help of other community members this project has been able to grow and benefit others beyond a scale that I alone could maintain. The one area where I am still the bottleneck is the release process. Getting myself out of that critical path so the project can continue to grow is a great idea.

Here are a few of my thoughts to augment what @sdudley-bsc wrote above:

I am not a professional python programmer, so the current release process was what I knew how to do when the package was created years ago. Everything should be challenged and if there are better more modern techniques, great.
The only things I really want to keep are (1) the use of pytest, I think it is the best testing package for python (2) the use of ReadTheDocs, that seems to be the standard
Travis-CI seems to no longer offer a free option for open-source projects, so if we want to move to another service that is fine with me. I have found their support for the latest python versions to always lag, so making a move could be a benefit
Right now we use Coveralls for code coverage. I have no view on them, if they are good, keep it, if there is a better service, then fine to move.
If the upload to PyPi was a part of the automated process that would be great. Most times people ping me to ask to update the PyPi and that is when I get around to it. Not the best process
If we do go this route then we need to (1) be very diligent on who and how a branch is merged into master if that is going to kick off the entire process. (2) how the change_log.rst is updated. Right now it is mostly myself putting in notes from the GitHub issues and releases and my memory
It might be useful in the docs to put a "how to make a new release" guide and a "how to contribute a pull request"

That's what I can think of at the moment.

rsheftel commented 2 years ago

And #8... if anyone wants to volunteer as a release manager, that would be very much appreciated.

sdudley-bsc commented 2 years ago

@Stryder-Git and @rsheftel, thanks for your comments.

@rsheftel, you're right that we'd need to be diligent about merging a development branch into master.

Regarding change_log.rst, I like the idea of a pull request author being responsible for adding at least draft release notes to the file as part of the pull request. Writing release notes may be easier for a pull request author than @rsheftel or whoever reviews the pull request. Of course, a reviewer could revise release notes or request changes.

Documenting how to contribute would be great. pandas addresses that in documentation: https://github.com/pandas-dev/pandas/tree/main/doc/source/development

Another option would be to add a CONTRIBUTING file to the root directory: https://github.com/dateutil/dateutil/blob/master/CONTRIBUTING.md https://github.com/pallets/flask/blob/main/CONTRIBUTING.rst https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md

GitHub documentation: https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors

One note on development workflow: The process is simple if the intent is to treat the merging of one normal pull request into master as a release. But what if we want to include multiple pull requests in one release? I think we could accommodate that by:

Creating a temporary release branch based on master
Merging each pull request into the release branch
Creating a new pull request to merge the release branch into master

sdudley-bsc commented 2 years ago

@rsheftel, I'm not opposed to keeping Travis CI if it works, but GitHub Actions may be a good alternative.

maread99 commented 2 years ago

If you were looking at using github actions, we've got a nice release workflow set up for exchange_calendars - just tag the release and the workflow handles the rest.

Another workflow drafts the release notes with the aid of a config file. There's also a workflow for running the test suite on pull requests and pushes into the master branch.

market_prices uses pretty much the same workflows and includes docs explaining the release process. (Although, at the moment there's an annoying bug (maread99/market_prices/issues/22) in the 'check release' part which doesn't manifest with exchange_calendars releases.)

Anyway, hope the workflows might serve you as a useful reference.

sdudley-bsc commented 2 years ago

@maread99, thanks for sharing those links. Your comment raises a question related to pandas_market_calendars release management: Should any manual intervention be necessary after a development branch is merged into master via pull request?

I'd like to avoid any manual steps after merging a pull request if possible.

Stryder-Git commented 2 years ago

@rsheftel and @sdudley-bsc, for full disclosure, I am also not a professional python developer and I have never set up a workflow as @maread99 described or taken the role of a release manager. Although, I am definitely interested in learning about it and help setting it up.

I wouldn't be opposed to volunteering as the release manager, if @rsheftel and the community trust me with that responsibility, and no one with more experience is interested. It also sounds like almost nothing would still have to be done manually, and that inspiration or even full imitation of @maread99's process can be adopted.

In the next two weeks my work on #198, should be completed. I could take some extra time to get familiar with GitHub Actions and the process that @maread99 suggested, if, as I said, you trust me with that. Of course, I also agree that we should reduce the amount of manual steps as much as possible.

rsheftel commented 2 years ago

@Stryder-Git I fully support you being the release manager.

sdudley-bsc commented 2 years ago

@Stryder-Git, you acting as release manager sounds good to me.

I've been working on automation: https://github.com/sdudley-bsc/pandas_market_calendars/tree/master/.github/workflows

The work is still in progress.

The GitHub Action release workflow doesn't yet use release notes from the change_log.rst file, upload files to PyPI, or update documentation on Read the Docs. One thing I'm not sure about is whether a release workflow should upload files to TestPyPI in addition to PyPI.

I'm interested in adding a separate GitHub Action workflow that runs when a pull request is created or updated. That workflow could replace the pandas_market_calendars project's use of Travis CI.

@rsheftel and @Stryder-Git, we should consider adding a versioning policy to the project as part of release automation. The scheme: https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#id58

may work well. It's based on semantic versioning but complies with PEP 440.

rsheftel commented 2 years ago

@sdudley-bsc

"@rsheftel and @Stryder-Git, we should consider adding a versioning policy to the project as part of release automation. The scheme: https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#id58"

I like semantic versioning and very much support this.

Stryder-Git commented 2 years ago

@rsheftel and @sdudley-bsc, thank you, I will gladly be the release manager then.

I have been looking into Github actions, and it seems like a great option for releasing and testing. The fact that TravisCI doesn't support python 3.10 already says enough, in my opinion. Although, I haven't looked into coverage and Coveralls, yet.

I also like the idea of having a dev branch, against which PRs could be opened that trigger the test workflow. After PRs that should be part of the same release are merged into it, that branch could be merged into master and trigger the release workflow.

Semantic versioning also sounds good to me.

If we get PRs against the dev branch, I think it would be sufficient to require a clear list of changes in the PR description. That way I could update the change_log.rst, including changes from other PRs, if there are any, and increment VERSION in setup.py. The release workflow could handle the rest when the dev branch is merged.

Of course, a proper CONTRIBUTING.md file, additionally including the requirement for tests and updates to docstrings would be a good investment.

@sdudley-bsc, I had a look at the draft workflow you created. I am also not sure if it should upload to TestPyPi in addition to PyPi, maybe @maread99 can enlighten us of any benefits/pitfalls regarding this? Also, I see that your workflow extracts the VERSION value from setup.py, I don't think the first version of this automation needs to do a lot more. I would be fine just updating change_log and VERSION manually after merging PRs into the dev branch, and then we can see how to optimize things further down the line.

If you want to continue testing/working on the workflow, @sdudley-bsc, I could create a CONTRIBUTING.md file. Does that sound good?

maread99 commented 2 years ago

Hi @Stryder-Git,

The release workflow for exchange_calendars uploads the package to the test repo and then the following lines check that this uploaded version can be installed and imported.

    - name: Install from test and test running
      run: |
        echo "Will wait 10s" && sleep 10s
        pip install --extra-index-url https://test.pypi.org/simple exchange_calendars==${{ github.ref_name }}
        python -c 'import exchange_calendars;print(exchange_calendars.__version__)'
        pip uninstall -y exchange_calendars

The advantage is that in the event there's some bug that results in a release failing to import, this should catch it and halt the release before uploading to the main PyPI repo.

https://github.com/gerrymanoim/exchange_calendars/issues/207 describes issues we had in getting pip to install the newly released version rather than a previously cached version. This seems to have been fixed for exchange_calendars (the above code seems to do what's required) although the same code hasn't fixed the same issue for market_prices (https://github.com/maread99/market_prices/issues/22). If you wanted to include the 'install from test repo and import' check I guess it'd be a matter of try it and see (if along the way you can see why it's not working for market_prices, do let me know! 😃 )

sdudley-bsc commented 2 years ago

@Stryder-Git, do you think a permanent dev branch is necessary? A GitHub Action workflow that runs tests could be triggered when a pull request is created or updated. If a permanent dev branch was used, would you create another pull request to merge dev into master when dev is ready?

Such a branch would facilitate the inclusion of multiple pull requests in one release. I suspect a pandas_market_calendars release typically reflects only one pull request. But a permanent dev branch would standardize the development workflow so that nothing different needs to be done if a release should include multiple pull requests.

Do you want to handle update of documentation manually? I may be able to automate that.

@Stryder-Git and @maread99, I'm skeptical that the production release process should upload files to TestPyPI. If the PyPA build package successfully creates .tar.gz and .whl files, I don't think those files need to be uploaded to TestPyPI before being uploaded to PyPI.

I'm also skeptical that a production release process should verify that a recently uploaded version of a package can be installed, either from TestPyPI or PyPI. @maread99, I wonder if any related issues associated with exchange_calendars or market_prices have occurred because of a delay between the upload of files for a specific version and the availability of that version for installation.

maread99 commented 2 years ago

I wonder if any related issues associated with exchange_calendars or market_prices have occurred because of a delay between the upload of files for a specific version and the availability of that version for installation.

We wondered this but were able to discount it by sleeping the workflow for a short period between uploading and installing (during this time it was possible to install the new version outside of the workflow although the workflow still subsequently installed the prior version). I'm pretty sure the issue was with caching - specifying the actual version for pip to install seems to have resolved the issue (for exchange_calendars at least).

Stryder-Git commented 2 years ago

@sdudley-bsc, I don't think a permanent dev branch is necessary but I was hoping to achieve the standardization benefit you mentioned. Even if a new release is based on a single PR, merging this into a dev branch first would still provide an extra layer of flexibility, and less 'perfection' of that PR.

If I am not missing anything, a release should include incrementing the version number, adding a summary of changes to the change_log, updating/adding docstrings to functions/classes and possibly updating the Readme and examples files.

What exactly needs to be done would dependent on the amount and nature of the changes. While I am getting more familiar with this process (and considering that most PRs/releases are small), I am fine doing most of this manually, and then seeing how and what can be automated in the future.

Regarding TestPyPi, I think @maread99 makes a good point, and that the process can mostly be copy-pasted from their workflow, so I do support this.

sdudley-bsc commented 2 years ago

@Stryder-Git, sorry about my delayed response. I've recently been busy with other work.

You mentioned that a release should include "updating/adding docstrings to functions/classes". I'm not entirely sure what you mean, but as a release manager, you shouldn't have to deal with adding docstrings or updating docstrings beyond minor edits unless you also did the actual development work. An author of a pull request should be responsible for docstrings associated with the work.

In my public fork of the repo, I updated the GitHub Action workflow and added a Python script that extracts release notes from the change log file. I won't be available to work on this issue further, but the public fork will remain available for a while in case it's useful to you or anyone else. Thanks for your work on release management.

rsheftel commented 2 years ago

A few additional thoughts:

We should make a task or checklist of the items discussed here to make sure we include everything discussed. I think GitHub has a Projects capability we could use. Or just a simple task checklist
The release process also needs to update the version number in the setup.py file. That is what drives the version for the PyPi. I know there are other ways to keep the version numbers and projects that help with that if you want to refactor the version number to another file. Regardless it needs to be part of the release process to update.
The examples in the documentation are created by running the make_examples.bat script in the docs directory. This isn't a great solution as it requires an extra step and needs jupyter installed. Better solution would be welcome

Stryder-Git commented 2 years ago

For anyone following this thread, have a look at #217, for a chance to inspect and criticize.

@maread99, After recreating the testing of the new release on testpypi, as you have suggested, I understand your pain of dealing with pip's caching. I tried some of the things that you have tried but I kept on getting unreliable results. I ended up implementing it a little differently, which seems to have fixed the issue quite reliably.

The following bash code extracts the latest version number from an index of available versions, compares it to the version we just uploaded, and, if these do not match, simply tries again 5 seconds later, which definitely gives me the best results.

https://github.com/Stryder-Git/pandas_market_calendars/blob/1bc9cbcc17e83d7ae9c8e21368d856463df3c500/.github/workflows/test_release.yml#L78-L82

I might not have spent enough time to study your workflows so I am not sure if this would help you, but there are two more things that contributed to my annoyance when working on this and might be of interest to you:

I see that you use --extra-index-url in the call to pip install, this would mean that pip will look in both pypi and testpypi, which might not be what you want. Using -ior --index-url would restrict it to only the url you are passing.
After successfully downloading from pypi or testpypi, you make python print the version number and then proceed to run the tests again. The problem I encountered is that after downloading, we have a pandas_market_calendars directory in the working directory (simply the code of the repo) and we have a pandas_market_calendars package in python's site-packages. When running python and importing pandas_market_calendars, this would not get the downloaded version, but it would get the working directory version, which is also what the tests would use. In the step after the one that I linked to above, I actually rename pandas_market_calendars in the working directory, so that subsequent imports will only find it in python's site-packages and actually use the downloaded version.

These insights allowed me to find some reliability, I hope it helps you somehow, too!

maread99 commented 2 years ago

@Stryder-Git, thanks for sharing those findings!

I'll certainly be making changes in light of the points you've raised - I can see how either or both could be responsible for the failure to import the new release.

rsheftel / pandas_market_calendars

Release Management #208