Closed sdudley-bsc closed 2 years ago
I like this, let's see if others chime in, but I am happy to help out setting this up!
As the original owner of this project I strongly encourage this. Thanks to the help of other community members this project has been able to grow and benefit others beyond a scale that I alone could maintain. The one area where I am still the bottleneck is the release process. Getting myself out of that critical path so the project can continue to grow is a great idea.
Here are a few of my thoughts to augment what @sdudley-bsc wrote above:
That's what I can think of at the moment.
And #8... if anyone wants to volunteer as a release manager, that would be very much appreciated.
@Stryder-Git and @rsheftel, thanks for your comments.
@rsheftel, you're right that we'd need to be diligent about merging a development branch into master.
Regarding change_log.rst, I like the idea of a pull request author being responsible for adding at least draft release notes to the file as part of the pull request. Writing release notes may be easier for a pull request author than @rsheftel or whoever reviews the pull request. Of course, a reviewer could revise release notes or request changes.
Documenting how to contribute would be great. pandas addresses that in documentation: https://github.com/pandas-dev/pandas/tree/main/doc/source/development
Another option would be to add a CONTRIBUTING file to the root directory: https://github.com/dateutil/dateutil/blob/master/CONTRIBUTING.md https://github.com/pallets/flask/blob/main/CONTRIBUTING.rst https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md
GitHub documentation: https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors
One note on development workflow: The process is simple if the intent is to treat the merging of one normal pull request into master as a release. But what if we want to include multiple pull requests in one release? I think we could accommodate that by:
@rsheftel, I'm not opposed to keeping Travis CI if it works, but GitHub Actions may be a good alternative.
If you were looking at using github actions, we've got a nice release workflow set up for exchange_calendars - just tag the release and the workflow handles the rest.
Another workflow drafts the release notes with the aid of a config file. There's also a workflow for running the test suite on pull requests and pushes into the master branch.
market_prices uses pretty much the same workflows and includes docs explaining the release process. (Although, at the moment there's an annoying bug (maread99/market_prices/issues/22) in the 'check release' part which doesn't manifest with exchange_calendars
releases.)
Anyway, hope the workflows might serve you as a useful reference.
@maread99, thanks for sharing those links. Your comment raises a question related to pandas_market_calendars release management: Should any manual intervention be necessary after a development branch is merged into master via pull request?
I'd like to avoid any manual steps after merging a pull request if possible.
@rsheftel and @sdudley-bsc, for full disclosure, I am also not a professional python developer and I have never set up a workflow as @maread99 described or taken the role of a release manager. Although, I am definitely interested in learning about it and help setting it up.
I wouldn't be opposed to volunteering as the release manager, if @rsheftel and the community trust me with that responsibility, and no one with more experience is interested. It also sounds like almost nothing would still have to be done manually, and that inspiration or even full imitation of @maread99's process can be adopted.
In the next two weeks my work on #198, should be completed. I could take some extra time to get familiar with GitHub Actions and the process that @maread99 suggested, if, as I said, you trust me with that. Of course, I also agree that we should reduce the amount of manual steps as much as possible.
@Stryder-Git I fully support you being the release manager.
@Stryder-Git, you acting as release manager sounds good to me.
I've been working on automation: https://github.com/sdudley-bsc/pandas_market_calendars/tree/master/.github/workflows
The work is still in progress.
The GitHub Action release workflow doesn't yet use release notes from the change_log.rst file, upload files to PyPI, or update documentation on Read the Docs. One thing I'm not sure about is whether a release workflow should upload files to TestPyPI in addition to PyPI.
I'm interested in adding a separate GitHub Action workflow that runs when a pull request is created or updated. That workflow could replace the pandas_market_calendars project's use of Travis CI.
@rsheftel and @Stryder-Git, we should consider adding a versioning policy to the project as part of release automation. The scheme: https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#id58
may work well. It's based on semantic versioning but complies with PEP 440.
@sdudley-bsc
"@rsheftel and @Stryder-Git, we should consider adding a versioning policy to the project as part of release automation. The scheme: https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#id58"
I like semantic versioning and very much support this.
@rsheftel and @sdudley-bsc, thank you, I will gladly be the release manager then.
I have been looking into Github actions, and it seems like a great option for releasing and testing. The fact that TravisCI doesn't support python 3.10 already says enough, in my opinion. Although, I haven't looked into coverage and Coveralls, yet.
I also like the idea of having a dev branch, against which PRs could be opened that trigger the test workflow. After PRs that should be part of the same release are merged into it, that branch could be merged into master and trigger the release workflow.
Semantic versioning also sounds good to me.
If we get PRs against the dev branch, I think it would be sufficient to require a clear list of changes in the PR description. That way I could update the change_log.rst, including changes from other PRs, if there are any, and increment VERSION in setup.py. The release workflow could handle the rest when the dev branch is merged.
Of course, a proper CONTRIBUTING.md file, additionally including the requirement for tests and updates to docstrings would be a good investment.
@sdudley-bsc, I had a look at the draft workflow you created. I am also not sure if it should upload to TestPyPi in addition to PyPi, maybe @maread99 can enlighten us of any benefits/pitfalls regarding this? Also, I see that your workflow extracts the VERSION value from setup.py, I don't think the first version of this automation needs to do a lot more. I would be fine just updating change_log and VERSION manually after merging PRs into the dev branch, and then we can see how to optimize things further down the line.
If you want to continue testing/working on the workflow, @sdudley-bsc, I could create a CONTRIBUTING.md file. Does that sound good?
Hi @Stryder-Git,
The release workflow for exchange_calendars
uploads the package to the test repo and then the following lines check that this uploaded version can be installed and imported.
- name: Install from test and test running
run: |
echo "Will wait 10s" && sleep 10s
pip install --extra-index-url https://test.pypi.org/simple exchange_calendars==${{ github.ref_name }}
python -c 'import exchange_calendars;print(exchange_calendars.__version__)'
pip uninstall -y exchange_calendars
The advantage is that in the event there's some bug that results in a release failing to import, this should catch it and halt the release before uploading to the main PyPI repo.
https://github.com/gerrymanoim/exchange_calendars/issues/207 describes issues we had in getting pip to install the newly released version rather than a previously cached version. This seems to have been fixed for exchange_calendars
(the above code seems to do what's required) although the same code hasn't fixed the same issue for market_prices
(https://github.com/maread99/market_prices/issues/22). If you wanted to include the 'install from test repo and import' check I guess it'd be a matter of try it and see (if along the way you can see why it's not working for market_prices
, do let me know! 😃 )
@Stryder-Git, do you think a permanent dev branch is necessary? A GitHub Action workflow that runs tests could be triggered when a pull request is created or updated. If a permanent dev branch was used, would you create another pull request to merge dev into master when dev is ready?
Such a branch would facilitate the inclusion of multiple pull requests in one release. I suspect a pandas_market_calendars release typically reflects only one pull request. But a permanent dev branch would standardize the development workflow so that nothing different needs to be done if a release should include multiple pull requests.
Do you want to handle update of documentation manually? I may be able to automate that.
@Stryder-Git and @maread99, I'm skeptical that the production release process should upload files to TestPyPI. If the PyPA build package successfully creates .tar.gz and .whl files, I don't think those files need to be uploaded to TestPyPI before being uploaded to PyPI.
I'm also skeptical that a production release process should verify that a recently uploaded version of a package can be installed, either from TestPyPI or PyPI. @maread99, I wonder if any related issues associated with exchange_calendars or market_prices have occurred because of a delay between the upload of files for a specific version and the availability of that version for installation.
I wonder if any related issues associated with exchange_calendars or market_prices have occurred because of a delay between the upload of files for a specific version and the availability of that version for installation.
We wondered this but were able to discount it by sleeping the workflow for a short period between uploading and installing (during this time it was possible to install the new version outside of the workflow although the workflow still subsequently installed the prior version). I'm pretty sure the issue was with caching - specifying the actual version for pip to install seems to have resolved the issue (for exchange_calendars at least).
@sdudley-bsc, I don't think a permanent dev branch is necessary but I was hoping to achieve the standardization benefit you mentioned. Even if a new release is based on a single PR, merging this into a dev branch first would still provide an extra layer of flexibility, and less 'perfection' of that PR.
If I am not missing anything, a release should include incrementing the version number, adding a summary of changes to the change_log, updating/adding docstrings to functions/classes and possibly updating the Readme and examples files.
What exactly needs to be done would dependent on the amount and nature of the changes. While I am getting more familiar with this process (and considering that most PRs/releases are small), I am fine doing most of this manually, and then seeing how and what can be automated in the future.
Regarding TestPyPi, I think @maread99 makes a good point, and that the process can mostly be copy-pasted from their workflow, so I do support this.
@Stryder-Git, sorry about my delayed response. I've recently been busy with other work.
You mentioned that a release should include "updating/adding docstrings to functions/classes". I'm not entirely sure what you mean, but as a release manager, you shouldn't have to deal with adding docstrings or updating docstrings beyond minor edits unless you also did the actual development work. An author of a pull request should be responsible for docstrings associated with the work.
In my public fork of the repo, I updated the GitHub Action workflow and added a Python script that extracts release notes from the change log file. I won't be available to work on this issue further, but the public fork will remain available for a while in case it's useful to you or anyone else. Thanks for your work on release management.
A few additional thoughts:
make_examples.bat
script in the docs directory. This isn't a great solution as it requires an extra step and needs jupyter installed. Better solution would be welcomeFor anyone following this thread, have a look at #217, for a chance to inspect and criticize.
@maread99, After recreating the testing of the new release on testpypi, as you have suggested, I understand your pain of dealing with pip's caching. I tried some of the things that you have tried but I kept on getting unreliable results. I ended up implementing it a little differently, which seems to have fixed the issue quite reliably.
The following bash code extracts the latest version number from an index of available versions, compares it to the version we just uploaded, and, if these do not match, simply tries again 5 seconds later, which definitely gives me the best results.
I might not have spent enough time to study your workflows so I am not sure if this would help you, but there are two more things that contributed to my annoyance when working on this and might be of interest to you:
I see that you use --extra-index-url
in the call to pip install, this would mean that pip will look in both pypi and testpypi, which might not be what you want. Using -i
or --index-url
would restrict it to only the url you are passing.
After successfully downloading from pypi or testpypi, you make python print the version number and then proceed to run the tests again. The problem I encountered is that after downloading, we have a pandas_market_calendars directory in the working directory (simply the code of the repo) and we have a pandas_market_calendars package in python's site-packages. When running python and importing pandas_market_calendars, this would not get the downloaded version, but it would get the working directory version, which is also what the tests would use. In the step after the one that I linked to above, I actually rename pandas_market_calendars in the working directory, so that subsequent imports will only find it in python's site-packages and actually use the downloaded version.
These insights allowed me to find some reliability, I hope it helps you somehow, too!
@Stryder-Git, thanks for sharing those findings!
I'll certainly be making changes in light of the points you've raised - I can see how either or both could be responsible for the failure to import the new release.
This issue is being submitted to propose a change to release management for pandas_market_calendars: Treat each update to the master branch as a release.
A GitHub Actions workflow (or maybe Travis CI) could be used to automate the following when master is updated:
It may also be possible to automate the update of documentation on Read the Docs.
Automating all release activity would provide the following benefits:
How contributing to the project would change: A pull request would need to include a new version number and associated release notes.
One important requirement: The automated steps above should not occur unless all tests in the pandas_market_calendars test suite have passed.
Currently, a Travis CI pull request build gets triggered when a pull request is created or updated. Such a build includes running the pandas_market_calendars test suite with multiple versions of Python.
To meet the requirement above, the .travis.yml file could be updated so that a build fails if any tests fail, and the reviewer of a pull request could verify that the associated Travis CI build succeeded. In addition to all that, a branch protection rule for the master branch could be added or updated to require that a Travis CI status check passes before a pull request can be merged. One potential issue: The master branch could be updated after a pull request is created.
Maybe that issue could be addressed by a pull request reviewer triggering another Travis CI pull request build.
Another potential requirement: The automated steps above should not occur if documentation cannot be built.
Feedback is welcome, especially on any potential issues I've missed.