lemire / fastrand

Fast random number generation in an interval in Python: Up to 10x faster than random.randint.
Apache License 2.0
88 stars 13 forks source link

Add package to pip #9

Closed crisbodnar closed 5 years ago

crisbodnar commented 5 years ago

It would be great if you could upload the package so that it can be installed via pip install fastrand. The current process is a bit tedious.

lemire commented 5 years ago

Seems reasonable.

I don't yet know how.

There are many guides online, but I don't want to spend a lot of time on it. Any "make your code available via pip install for dummies"?

cc @Ezibenroc @TkTech

lemire commented 5 years ago

I am guessing that this needs to be platform-specific... So I need to use Visual Studio on the one hand for Windows, and then something like GCC for Linux... I probably need a precise Visual Studio version, and have it configured to compile Python extensions?

TkTech commented 5 years ago

Making the source packages available is pretty trivial, setting up the build matrix for making binary wheels is the real kicker. I have 6 different versions of msvc installed...

Doesn't look like your C depends on any really recent features, which means it can probably build on centOS 5 with gcc 4, so you can use manylinux for binary wheels.

Take a look at the .appveyor in pysimdjson, https://github.com/TkTech/pysimdjson/blob/master/.appveyor.yml. You'll want to expand the build-matrix to run for older versions of Python as well. This .appveyor takes care of building & testing the package. If the build was triggered by a tag push to git, it'll also push the binary wheels to pypi.

lemire commented 5 years ago

@TkTech The .appveyor.yml might help. How do I get a TWINE_USERNAME and TWINE_PASSWORD? You can assume that I have an account on pypi.org.

The manylinux thing looks useful... but it it not clear to me how I get to a pipy release from that...? And evidently people will want Python 3 support... that won't get me that, will it?

TkTech commented 5 years ago

TWINE_USERNAME and TWINE_PASSWORD are a pypi username & password. To be paranoid, you should create a 2nd user that you then add to your project as a collaborator, and use this account for uploading wheels. This way you can share it with other maintainers as well. pysimdjson_builder for example is used for pysimdjson.

On appveyor (the website) go to the environment variable page under settings, put in your new user's password, and press the little lock to encrypt it. Then go to "Export Yaml" and you'll get the encrypted format that's safe to put into git.

When twine goes to upload to pypi, it'll use these 2 environment variables to login.

It's a bit of headache to setup, but it's reliable and once you're done it'll be 100% automatic and you'll never have to touch it again.

manylinux is just a docker image. Use it in your circleci config just like you used gcc:latest or ubuntu:18.04. At the end of your build call twine upload dist/* just like in the appveyor config. Because manylinux uses an incredibly old version of ... everything really ... the binary wheels it produces will wokr on practically all linux machines, no compiler needed.

Both appveyor and manylinux include many versions of Python all precompiled and configured in different directories, so you can use them to upload 10+ versions from 2.6 to 3.8 without issue.

lemire commented 5 years ago

That sounds useful. Let me try.

lemire commented 5 years ago

@TkTech I copied and pasted your .appveyor.yml file and I get this error...

error: invalid command 'bdist_wheel'
69

When I try locally, I get the same error.

Locally I tried everything on this SO page... https://stackoverflow.com/questions/34819221/why-is-python-setup-py-saying-invalid-command-bdist-wheel-on-travis-ci

But nothing works... I get this...

~/CVS/github/fastrand$ pip install wheel
Requirement already satisfied: wheel in /anaconda3/lib/python3.6/site-packages (0.33.1)
~/CVS/github/fastrand$ python setup.py bdist_wheel
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: invalid command 'bdist_wheel'
TkTech commented 5 years ago

https://github.com/lemire/fastrand/blob/master/setup.py#L1

You're using an ancient distutils, pretty much all modern Python packaging needs setuptools. Just change that line to:

from setuptools import setup, Extension

Twine and setuptools replace the old "setup.py (register|upload)" and distutils workflows. You shouldn't need pip install wheel either.

lemire commented 5 years ago

I tried setting up a circleci thing... https://github.com/lemire/fastrand/blob/master/.circleci/config.yml

I get...

Starting container manyunix:latest
  image cache not found on this host, downloading manyunix:latest

Error response from daemon: repository manyunix not found: does not exist or no pull access

I really don't know much about these configuration files...

TkTech commented 5 years ago

Gotta use the full URL for quay.io, gcc:latest is really a shortcut for hub.docker.com/.... Use quay.io/pypa/manylinux1_x86_64 instead.

Everyone goes through the whack-it-with-a-stick-until-it-works runaround setting this stuff up the first time. Eventually you know it by heart :)

For plain python packages with no C extensions it's much simpler, 2 lines total. All this is required because you need binary wheels to avoid depending on a compiler for the end user.

lemire commented 5 years ago

Ok. So manylinux does not have pip nor setuptools.

Do I have to apt-get the thing?

https://github.com/lemire/fastrand/blob/master/.circleci/config.yml

Is that going to end up supporting many versions of Python... how so?

Note: I am not sure I want to become a Python release guru.

lemire commented 5 years ago

Ok. So pretty much nothing work. Stopping for now. I'll take any help people offer, but I don't think I'll keep hacking this much longer. Not fun.

TkTech commented 5 years ago

You've skipped over a few sections on the manylinux readme. There are multiple versions of python installed in /opt/python/<python tag>-<abi tag>. You need to specify which python you want to use. Ex: /opt/python/cp37-cp37mu python setup.py install. The .appveyor file is setup the same way.

You don't need to install either just to build the wheel.

lemire commented 5 years ago

Ok. So this was useful. Here is where I got...

https://github.com/lemire/fastrand/blob/master/.circleci/config.yml

The Python paths are wrong, but I don't see where in the README the list of valid <python tag>-<abi tag> are provided.

The one path that works is /opt/python/cp27-cp27mu but I got

400 Client Error: Binary wheel 'fastrand-1.0-cp27-cp27mu-linux_x86_64.whl' has an unsupported platform tag 'linux_x86_64'.

The appveyor thing appears to work.

TkTech commented 5 years ago

Almost there! Only 2.7 has mu & m builds IIRC. The path for everything past 3.3 else should just end in m. One really nice feature of circleci is being able to click on a build that failed and re-run it with SSH, so you can just ssh into your failed build and look around if you're not sure what's available or want to try tweaking a step.

After you've built all your wheels, there's one final step which is running auditwheel. This takes all the wheels you've built so far, fixes binaries (and any external libraries if you had some) and creates the final "manylinux" wheel.

You don't need to keep installing twine! Your final script should look something like this (from memory, probably a typo or two):

version: 2
jobs:
  "build-and-deploy":
    docker:
      - image: quay.io/pypa/manylinux1_x86_64
    steps:
      - checkout

      - run: /opt/python/cp27-cp27mu/bin/python setup.py bdist_wheel
      - run: /opt/python/cp27-cp27m/bin/python setup.py bdist_wheel
      - run: /opt/python/cp34-cp34m/bin/python setup.py bdist_wheel
      - run: /opt/python/cp35-cp35m/bin/python setup.py bdist_wheel
      - run: /opt/python/cp36-cp36m/bin/python setup.py bdist_wheel
      - run: /opt/python/cp37-cp37m/bin/python setup.py bdist_wheel

      - run:
          name: Running auditwheel...
          command: |
              for whl in dist/*.whl; do
                  auditwheel repair "$whl"
              done

      - run:
           name: Uploading to pypi
           command: |
               /opt/python/cp37-cp37m/bin/pip install twine
               /opt/python/cp37-cp37m/bin/twine upload dist/*

workflows:
  version: 2
  tagged-build:
    jobs:
      - "build-and-deploy":
          filters:
            tags:
              only: /^v.*/

Note the workflow step - this means the job called "build-and-deploy" will only run when you push a tag to github that starts with v, like "v1.0.0". This stops you from uploading to pypi on every single commit to the repo, instead it'll only run the job when you actually push a release.

lemire commented 5 years ago

This makes sense...

It still fails with the same error, however...

HTTPError: 400 Client Error: Binary wheel 'fastrand-1.0-cp27-cp27m-linux_x86_64.whl' has an unsupported platform tag 'linux_x86_64'. for url: https://upload.pypi.org/legacy/
Exited with code 1

Note the workflow step - this means the job called "build-and-deploy" will only run when you push a tag to github that starts with v, like "v1.0.0".

Will it?

It ran just now with my little commit that copied-and-pasted your proposed config file...

https://circleci.com/gh/lemire/fastrand/17

I don't think I tagged this commit with a version...

lemire commented 5 years ago

On this note, the appveyor seems to also run with every commit...

https://ci.appveyor.com/project/lemire/fastrand/builds/22647261/job/q6nuihyg0p4gdupb

So my guess is that if ($env:APPVEYOR_REPO_TAG -eq $TRUE) is always true.

lemire commented 5 years ago

I figured out a fix.

However, the script still appears to run each time a commit is made. That's low priority for me at this time.

lemire commented 5 years ago

Closing, largely thanks to @TkTech, this is (apparently) resolved.

lemire commented 5 years ago

The trick btw is that the repaired files go into wheelhouse.

Why things need to be "repaired" is a concerning issue. You would think that this extra step should not be necessary.

TkTech commented 5 years ago

Appveyor always runs, but it isn't uploading to pypi unless there's a tag. That's so tests can run on every commit, buy deploys only happen on new versions.

TkTech commented 5 years ago

Repairing might not be the best name. When you make the wheels you've made a binary Linux package that will only work reliably on a clone of that machine. The repair stage makes sure any required libs are copied into the wheel, fixes the paths so the right libs are used, and some other changes to make sure the wheel can be used anywhere.

lemire commented 5 years ago

When you make the wheels you've made a binary Linux package that will only work reliably on a clone of that machine. The repair stage makes sure any required libs are copied into the wheel, fixes the paths so the right libs are used, and some other changes to make sure the wheel can be used anywhere.

Why does this require an entirely different tool and messing around with files like this? Why can't you create the wheel in a portable mode from the start?

TkTech commented 5 years ago

To be clear, this process is the same even for non-python apps. All portable Linux binaries go through the same process (although there are alternatives), or rather binary packages are usually built by huge build farms for every combination of architecture and platform version instead of even trying to make it portable.

The first step, creating the origin bdist_wheel, creates a normal binary package that is only guaranteed to work on the exact same host, or very similar hosts. If all you're doing is building a package you want to distribute to your cluster of identical machines, it's perfect.

But maybe you want to make it portable, so we run another step (auditwheel) that checks to make sure it can be made portable, brings in external dependencies so they're included with your wheel, and fixes up some lookups so your binary looks for things in your wheel instead of on the platform.

lemire commented 5 years ago

Yet if you have a Rust, Go or Swift dependency, there is no such messing around... which indicates that it is not necessary even if, as you argue, it is common.

Nobody expects the distribution of a C or Java library to be easy. But modern languages should make it easy to support extension.

I have personally written Rust, Go and Swift wrappers around C/C++ code, and it was waaaaayyyyy easier.

Heck... I would have an easier time delivering this extension as a WebAssembly extension to JavaScript...

This is not my first (bad) experience making native Python extensions "easy" to install... Python seems quite fragile and backward in this respect.

It is really sad considering how nice the language is (I have used Python every day for almost 20 years now), but the ecosystem is clearly not where it should be.

lemire commented 5 years ago

I get how fancy stuff could be hard... something like simdjson is another story... but this fastrand thing is essentially a C function. That it is hard to deliver a C function to Python users is sad.

TkTech commented 5 years ago

I agree 100% it needs improvement, but remember the cpython interpreter is 29 years old, and has all the cruft that comes from trying to maintain (mostly) backwards compatibility over such a long period on 24 different architectures (sometimes more, sometimes less). And improvements are being made, if slowly. setuptools and pip get smarter every week, and binary wheels themselves are a brand new creation. The wheel standard is being updated, and is actively developed.

crisbodnar commented 5 years ago

Thank you, @lemire and @TkTech! Much appreciated!