pyproj4 / pyproj

Python interface to PROJ (cartographic projections and coordinate transformations library)
https://pyproj4.github.io/pyproj
MIT License
1.05k stars 212 forks source link

Request: release a patch version that pins cython<3.0 #1342

Closed gwerbin closed 1 year ago

gwerbin commented 1 year ago

Currently, it's impossible to build Pyproj 3.6.0 with PEP 517 build isolation because Pip will install Cython 3.x, which Pyproj 3.6.0 is incompatible with.

This can be easily fixed by releasing a Pyproj 3.6.1 or 3.6.0.post1 version that does simply adds <3.0 to the Cython build dep: https://github.com/pyproj4/pyproj/compare/3.6.0...gwerbin:pin-cython-below-3.0?expand=1

I understand that the current master branch updates Pyproj to use only Cython 3.x, but I'm not aware of a release date for it. This patch would be very useful for end users who need to build their own wheels (e.g. for PyPy), while the next release is still in development.

snowman2 commented 1 year ago

For most projects, a release is fairly simple as you mention. However, due to a requirement on maintainers free time, building wheels, and testing with downstream Linux release managers, it is not quite as simple for pyproj.

The next release of pyproj depends on python 3.12 compatibility updates from a release from numpy to ensure stability.

We welcome assistance preparing pyproj for the next release with Python 3.12 wheels.

gwerbin commented 1 year ago

@snowman2 thanks for the reply. Is it a matter of someone running through the release instructions here? Or is there more to it? I'm happy to dedicate some time to the patch release, unless the next real release of Pyproj is right around the corner.

snowman2 commented 1 year ago

The blocker for the next release is #1330. Once that is ready, the release will follow. That likely depends on the next release of numpy, but there may be a way to get it to work now.

The instructions you linked to are correct. However, slightly out of date. The wheels are automatically uploaded except from Cirrus CI as those need to be manually uploaded to pypi currently.

djhoese commented 1 year ago

@snowman2 I had some luck with the vispy project (builds wheels with a single Cython extension) with forcing the pre-release of numpy to be installed. I see you're already doing that, but it looks like for some reason it is trying to build numpy from source. Here's vispys cibuildwheel env vars:

https://github.com/vispy/vispy/blob/a1a639f33c59a16c8af2bf605a23b55210569f5e/.github/workflows/wheels.yml#L38-L39

If needed I can try to take a look tomorrow, but no guarantees on time. I also don't build 32-bit wheels for vispy (or any of my packages) and it looks like you do...how do you handle that with numpy not providing 32-bit wheels?

snowman2 commented 1 year ago

I think that it is the pypy wheels and win32 wheels that are building from source. Depending on the win32 failures, that one would be okay to disable.

snowman2 commented 1 year ago

There are suggestions for workarounds linked in #1330 to numpy issues.

djhoese commented 1 year ago

My kid had to stay home sick today so I'm not really getting any work done today and probably not tomorrow.

snowman2 commented 1 year ago

No worries @djhoese. I hope that your kid feels better soon.

snowman2 commented 1 year ago

Coming soon ... https://github.com/pyproj4/pyproj/discussions/1344

gwerbin commented 1 year ago

Thanks @snowman2! I would still be happy to help with patch or "post" releases for older versions for the sake of any users who can't upgrade for whatever reason. In my case, updating to the newest version is fine.

snowman2 commented 1 year ago

https://pypi.org/project/pyproj/3.6.1/

snowman2 commented 1 year ago

I would still be happy to help with patch or "post" releases for older versions for the sake of any users who can't upgrade for whatever reason.

Contributions to help with releases are welcome :+1:

djhoese commented 1 year ago

@snowman2 could you point me to the release instructions? In your opinion what is the hardest part? Would you consider pyproj's release process much harder than other python packages you maintain (ex. rioxarray) given how tightly it is tied to the PROJ library? Or are there other reasons?

snowman2 commented 1 year ago

could you point me to the release instructions?

https://github.com/pyproj4/pyproj/blob/main/HOW_TO_RELEASE.md

In your opinion what is the hardest part?

The wheels.

The matrix of python versions (3.9-3.12), operations systems (Windows, MacOS, Linux), architectures (x86, i686, x86_64, amd64), and python implementations (cpython, pypy) make for long build times and a high potential for failures. Due to the long build times, it takes a while to debug as the CICD process takes a long time to complete. It is a never ending game of wackamole to keep it stable.

Most of the wheels are build using GitHub Actions. However due to need to support MacOS arm64 and Linux aarch64, the wheels are build on Cirrus CI and Travis CI. With Travis CI, we have a limited amount of credits, so reducing the frequency of the releases helps to stretch the credits farther.

Would you consider pyproj's release process much harder than other python packages you maintain (ex. rioxarray) given how tightly it is tied to the PROJ library?

Essentially, yes. PROJ makes it required to provide wheels. With rioxarray, I can make a new release in 1 minute and it is all automatically uploaded to pypi. I make a lot of rioxarray releases as soon as features are added (release early, release often 😄 ). With pyproj, it sucks hours of my time both preparing for and making sure the builds complete properly (which rarely happens the first time around). So, I try to limit the number of pyproj releases to reduce the amount of time I have to dedicate to making releases.

Additionally, there are downstream linux distribution package managers that are kind enough to run tests on pyproj before each release. I usually try to limit the number of pyproj releases to be respectful of their time.

In general, a release every 4-6 months is the current cadence of pyproj releases.

djhoese commented 1 year ago

A couple thoughts:

  1. Is cirrus CI used for arm64 for "parallel" wheel generation? Unless I'm forgetting something about my own projects, it is possible to make macos aarch in github actions.
  2. Why are the wheels on cirrus CI manually uploaded?
  3. Have you had users request PyPy wheels? If they build successfully then great, but if they're a burden to maintain then maybe we drop it. In the linked to numpy issue about PyPy they're talking about dropping PyPy 3.9 wheels now anyway and then will do PyPy 3.10 wheels later.
  4. I've never had a slow enough build process (even in my Cython-based projects) to justify this, but what about splitting the cibuildwheel builds up into multiple github actions environments/jobs? So one for linux 64-bit python 3.9 and 3.10, one for 64-bit linux 3.11 and 3.12, one for 32-bit linux...and so on. I'm not sure how github actions feels about that many environments but theoretically if any of them run in parallel it should be faster than what is happening now.
  5. Looking at the last releases wheel building I see for CPython wheels it takes about 70s to build the wheel and about 120s to test the wheel. Additionally the PROJ building before all the wheel processes takes 10 minutes. For PyPy wheel its 235 seconds (4 minutes) to test the wheel. This point is more about the timing info than it is about suggesting anything.
  6. Is/can the PROJ build be cached? Looking at the proj-compile-wheels.sh script it doesn't seem too complicated as far as depending on pyproj's code state. The hardest part to me seems that it needs to (should?) run on the same docker image as the one the wheels are built on. This proj build could maybe even be its own set of docker images based on the upstream PyPA images that are pulled in during wheel building time and only get updated when the proj-compile-wheels.sh script gets updated. I see some amount of caching being done on Windows, but it looks like MacOS can do that too: https://cibuildwheel.readthedocs.io/en/stable/setup/#macos-windows-builds. Otherwise we could build docker images like I said and specify them with https://cibuildwheel.readthedocs.io/en/stable/options/#linux-image
  7. The tests aren't failing when they should. The last release had failures in multiple spots but didn't die. Shapely didn't have a wheel for the platform/python version so it tried to build from source, couldn't find the geos library, and then failed to install. Wheel testing continued though and failed in some spots including failing to import shapely.
  8. What are your thoughts on identifying a specific set of tests and marking them with a pytest mark and only running those tests (possibly with a reduced set of dependencies?) for wheel tests?
djhoese commented 1 year ago

I should have maybe started with: this is just me brainstorming and not suggesting that you alone should tackle these things. So as a project, what should pyproj do to speed this up and what has been tried/avoided in the past?

Oh:

  1. How possible would it be to automate the downstream linux distribution testing? For example, something in the github action triggers their builds and there is a known URL to look for whether it passed or not and see the log. That way no one on their side of things needs to do much if anything and you/we don't have to passively wait for something to happen? Or...depending on their update cycle, maybe we ignore their build success until it is a problem and then come out with bug fix releases?
snowman2 commented 1 year ago
1. Is cirrus CI used for arm64 for "parallel" wheel generation? Unless I'm forgetting something about my own projects, it is possible to make macos aarch in github actions.

It is used so the wheels generated can be tested. You can generate the wheels on Actions, but cannot test them. In my experience, it is a bad Idea to release something you haven't tested.

2. Why are the wheels on cirrus CI manually uploaded?

Automating it is on the TODO list.

3. Have you had users request PyPy wheels? If they build successfully then great, but if they're a burden to maintain then maybe we drop it. In the linked to numpy issue about PyPy they're talking about dropping PyPy 3.9 wheels now anyway and then will do PyPy 3.10 wheels later.

IIRC someone requested them a while back...

4. I've never had a slow enough build process (even in my Cython-based projects) to justify this, but what about splitting the cibuildwheel builds up into multiple github actions environments/jobs? So one for linux 64-bit python 3.9 and 3.10, one for 64-bit linux 3.11 and 3.12, one for 32-bit linux...and so on. I'm not sure how github actions feels about that many environments but theoretically if any of them run in parallel it should be faster than what is happening now.

I am open to this idea.

5. Looking at the last releases wheel building I see for CPython wheels it takes about 70s to build the wheel and about 120s to test the wheel. Additionally the PROJ building before all the wheel processes takes 10 minutes. For PyPy wheel its 235 seconds (4 minutes) to test the wheel. This point is more about the timing info than it is about suggesting anything.

6. Is/can the PROJ build be cached? Looking at the `proj-compile-wheels.sh` script it doesn't seem too complicated as far as depending on pyproj's code state. The hardest part to me seems that it needs to (should?) run on the same docker image as the one the wheels are built on. This proj build could maybe even be its own set of docker images based on the upstream PyPA images that are pulled in during wheel building time and only get updated when the `proj-compile-wheels.sh` script gets updated. I see some amount of caching being done on Windows, but it looks like MacOS can do that too: https://cibuildwheel.readthedocs.io/en/stable/setup/#macos-windows-builds. Otherwise we could build docker images like I said and specify them with https://cibuildwheel.readthedocs.io/en/stable/options/#linux-image

That is definitely something to look into.

7. The tests aren't failing when they should. The last release had failures in multiple spots but didn't die. Shapely didn't have a wheel for the platform/python version so it tried to build from source, couldn't find the geos library, and then failed to install. Wheel testing continued though and failed in some spots including failing to import shapely.

Those are optional test dependencies. If it works, great. If not, not a big deal.

8. What are your thoughts on identifying a specific set of tests and marking them with a pytest mark and only running those tests (possibly with a reduced set of dependencies?) for wheel tests?

numpy is the only bottleneck at the moment. Not sure I would want to release without testing using numpy.

How possible would it be to automate the downstream linux distribution testing? For example, something in the github action triggers their builds and there is a known URL to look for whether it passed or not and see the log. That way no one on their side of things needs to do much if anything and you/we don't have to passively wait for something to happen? Or...depending on their update cycle, maybe we ignore their build success until it is a problem and then come out with bug fix releases?

Sounds worth looking into.

djhoese commented 1 year ago

It is used so the wheels generated can be tested. You can generate the wheels on Actions, but cannot test them. In my experience, it is a bad Idea to release something you haven't tested.

I can understand that. Usually the temptation is too great and I just end up not testing for the hard-to-test platforms. In my usual cases though these are just simple Cython extensions with simple for loops over some numpy arrays so the compatibility usually leans on Cython and numpy's testing so I'm a lot less scared about my own libraries doing something incompatible.

My only worry with two separate build systems is if one uploads to PyPI but the other fails and you get this weird partial release. I suppose that is one advantage to the non-automatic cirrus wheels.

Those are optional test dependencies. If it works, great. If not, not a big deal.

Hm, what I saw was a doctest failure. I guess if that's integrated with your pytest and is an xfail or whatever then :+1:

numpy is the only bottleneck at the moment. Not sure I would want to release without testing using numpy.

I guess I was thinking more breaking up the categories of tests. Like basic python-heavy functionality probably isn't going to break between platforms but if they're fast then whatever include them. The heavy PROJ compatibility is probably always needed. I guess it depends on what tests take the longest. Is it an even distribution for test execution time or are there some that are like 20 seconds each that could be skipped.

Otherwise...

it was brought to my attention today that pykdtree and pyresample wheel building are not in the modern times and I need to overhaul them. I'm working on that now and if they have any decent amount of build time maybe I'll try that splitting per-platform thing and see how github actions feels about it. I could then port that to pyproj.

djhoese commented 1 year ago

Oh about not testing on GA, you're talking about not being able to test on the emulated platforms?

gwerbin commented 1 year ago

Re: PyPy wheels, I actually first ran into this problem because I was trying to install Pyproj on PyPy, and there wasn't a wheel available for my particular platform, so it tried to build from source and failed.

Given that problems like the current one are very rare, I don't think it's such a bad idea to drop wheel builds that are an undue maintenance burden.

snowman2 commented 1 year ago

Oh about not testing on GA, you're talking about not being able to test on the emulated platforms?

That sounds about right.

snowman2 commented 1 year ago

I guess I was thinking more breaking up the categories of tests.

The tests are pretty quick, so I wouldn't spend too much time optimizing those. The main bottleneck is dependencies.

djhoese commented 1 year ago

Of the 28m45s of the ubuntu wheel building for the last release, the testing makes up 11+ minutes (~37%). If I can get PROJ building cached testing becomes ~58% of the build time. I also suggest we drop PyPy 3.9 given that numpy will likely drop it:

https://github.com/numpy/numpy/issues/24728

And if that's dropped then that does take ~5 minutes off of the total build time. PyPy tests take most of that time because they try to build shapely from source. So yeah the dependencies being installed don't help, but I'm not sure there is much that can be done there without caching them...which they might be already internal to cibuildwheel...I'll look at that too.

djhoese commented 1 year ago

And I've decided against splitting the environments based on python version. If I get caching working for PROJ then I'll reconsider it. The main downside though is that updating cibuildwheel doesn't automatically get you wheels for new versions of Python because you're likely explicitly setting what versions of Python to build.

Edit: ...and you're already splitting on platform/arch because of the GA versus cirrus versus appveyor split.