pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.6k stars 968 forks source link

Why does warehouse allow linux_armv6l and linux_armv7l wheels? #3668

Open njsmith opened 6 years ago

njsmith commented 6 years ago

Historically we never allowed linux wheels on PyPI, and this was an intentional choice, because linux doesn't tell us enough to make any guesses about ABI compatibility. That's still true. But warehouse now allows specifically linux_armv6l and linux_armv7l as legal wheel platforms (but no other options).

This appears to have come in through #2003, #2010, #2012, but none of those have any discussion of why we suddenly changed this policy...

What's going on here? Why are we allowing linux wheels at all, and why is it specific to these two CPU architectures?

njsmith commented 6 years ago

This appears to be connected somehow to the piwheels project to provide wheels for Raspberry Pis running Raspbian, which is super cool. But as their FAQ says:

Will wheels provided on piwheels work on other ARM platforms?

Some wheels may work on other ARM platforms, but we can't guarantee this. Pure Python wheels will certainly work, but there's a much smaller speed increase installing from wheels over source distribution. ARM platform wheels on piwheels are tagged armv6l and armv7l but actually both contain ARMv6-compatible code, which may not work on true ARMv7 platforms.

bennuttall commented 6 years ago

I started the piwheels project as a solution for Raspberry Pi users, and around the same time, some google devs working on a Pi project wanted to be able to distribute wheels for Pi users. piwheels was just getting started so they requested warehouse supported uploading armv6 and armv7 wheels. This was accepted quite quickly and then warehouse's back-end became the new default for uploads (a while before the front-end).

Because Raspberry Pi's OS, Raspbian, is compatible with all Pi models (Pi 1/Zero are Armv6, Pi 2 is Armv7 and Pi 3 is Armv8), it's all Armv6 userland, despite Pi 2/3 presenting as Armv7. So wheels built on a Pi 3 will have the platform tag linux_armv7l, but are identical to a wheel built on a Pi 1 (just much faster build time), so we provide armv7 wheels built on a Pi 3, and symlink them to provide armv6 wheels for Pi 1/Zero users.

I wouldn't advise anyone upload these kind of wheels to PyPi (note you can't upload two identical files even with different filenames, so you could only upload an armv7 not both), because they're not truly Armv7. But I guess there's a use case for other Arm boards. piwheels doesn't pretend to be an Arm package repository, just a Raspberry Pi one. But if package maintainers want their packages to work for Pi users and other Arm board users, they need to be sure what they upload is compatible with everyone (and hope that doesn't cause piwheels to symlink armv7 to armv6 and break it for some people).

See https://github.com/bennuttall/piwheels/issues/66 for issues around people uploading true Armv7 wheels to PyPI.

ncoghlan commented 6 years ago

I'm assuming that we can't reinstate the prohibition on Linux wheel uploads for ARM at this point without causing UX problems for Raspberry Pi users. So let's not do that :)

Instead, my suggestion would be that we restrospectively declare linux_armv7l and linux_armv6l to be Raspbian specific compatibility tags, and update pip (and any compatibility tag helper libraries) to check /etc/os-release/ before considering them as candidates for downloading from PyPI.

If folks from the Raspberry Pi project were then interested in helping define distro-specific ABI compatibility tags (which would make it possible to distribute distro-specific wheels via PyPI without causing compatibility problems for users of other distros), that would be most welcome: https://mail.python.org/pipermail/distutils-sig/2018-April/032117.html is the latest write-up of a design that we believe would work for that purpose, and what might be involved in getting there.

ncoghlan commented 6 years ago

https://github.com/pypa/warehouse/pull/3806 proposes a new comment on the list of permitted ABI tags noting that there's a lot more involved in adding a new ABI tag than just approving it there.

njsmith commented 6 years ago

Would removing support for uploading linux_armv?l wheels cause a problem for Raspberry Pi users? Or are they all using piwheels instead? How many linux_armv?l wheels actually ended up getting uploaded to PyPI?

(Agreed that it would be great to have a proper wheel tag for this.)

ncoghlan commented 6 years ago

Ah, I'd missed that detail (https://www.piwheels.org/ is the URL). In that case, yeah, it would be highly desirable to go back to preventing use of these tags on pypi.org itself, and have the Warehouse level feature be a config setting for "extra wheel compatibility tags".

Then @bennuttall could add these tags to the config setting for piwheels, and be in a similar situation to Galaxy Project where the ABI is implied by the repository you're using, rather than the compatibility tag.

njsmith commented 6 years ago

Here's all the linux_arm downloads for PyPI in March 2018: https://docs.google.com/spreadsheets/d/1rbdY2KT8t4o7BVEmwvwc9Fh7D7wpUjj6tiClEhrQ0lY/edit#gid=26881496

Looks like ~20k total, over a handful of packages, mostly from google (maybe the dependency stack for a single package?)

Query:

``` SELECT COUNT(*) AS downloads, file.filename FROM TABLE_DATE_RANGE( [the-psf:pypi.downloads], TIMESTAMP("20180301"), TIMESTAMP("20180401") ) WHERE file.filename CONTAINS 'linux_arm' GROUP BY file.filename ORDER BY downloads DESC LIMIT 1000 ```
dstufft commented 6 years ago

I don't particularly remember merging that PR, but generally we don't prevent people from uploading incompatible wheels (e.g. you can upload a manylinux wheel that is actually ubuntu specific) and there are benefits to allowing the generic tags. You may have additional files that only make sense on a Linux distribution (e.g. man files, etc) while still being a pure Python wheel. I believe that Armin Ronacher has been looking for a generic linux tag for quite some time as well for similar reasons.

Quite generally, I don't think that the linux_arvm?l tags are Pi specific, and if folks are using them that way, that's a bug in their packages.

njsmith commented 6 years ago

@dstufft the problem is that if you don't treat the linux_armv?l tags as being Pi specific, then you can't use them at all, because there is no generic "linux arm" ABI, there's only "this works on a Pi running this version of Raspbian, but not Android", or "this works on Android, but not on a Pi", etc. etc.

OK yes I guess if Armin is building fully statically linked rust binaries that don't depend on the system libc, then that is an actual generic linux wheel, but if we want to allow that then we should talk about it on distutils-sig and there's no reason it should be allowed only on ARM. And I'm 99% sure that none of the linux ARM wheels currently on PyPI are like this.

bennuttall commented 6 years ago

Quite generally, I don't think that the linux_arvm?l tags are Pi specific, and if folks are using them that way, that's a bug in their packages.

linux_armv?l tags aren't Pi specific, but if someone built a wheel on a Pi, it may be incompatible with other Arm boards.

Ah, I'd missed that detail (https://www.piwheels.org/ is the URL). In that case, yeah, it would be highly desirable to go back to preventing use of these tags on pypi.org itself, and have the Warehouse level feature be a config setting for "extra wheel compatibility tags".

:+1:

Then @bennuttall could add these tags to the config setting for piwheels, and be in a similar situation to Galaxy Project where the ABI is implied by the repository you're using, rather than the compatibility tag.

@ncoghlan Are you suggesting we force another tag on the wheels we build? Wouldn't that require a change in pip for requests from Raspbian to match the tag?

Does anyone consider the fact pip identifies the platform as armv7 on a Pi 2/3 a bug, when it's actually armv6 userland? It's a detail we can mostly work around but it's the only compatibility issue.

njsmith commented 6 years ago

Does anyone consider the fact pip identifies the platform as armv7 on a Pi 2/3 a bug, when it's actually armv6 userland? It's a detail we can mostly work around but it's the only compatibility issue.

Yeah, you should file a separate bug on pip for that. It already has similar code to work around a similar problem on x86 (64-bit processor but 32-bit userland): https://github.com/pypa/pip/blob/7b1f2a06d24bd90a28405e52e9184848d33576c7/src/pip/_internal/pep425tags.py#L135-L138

dstufft commented 6 years ago

if we want to allow that then we should talk about it on distutils-sig and there's no reason it should be allowed only on ARM.

I could have swore we already relaxed the restriction on x86 and amd64 wheels as well to allow them, but looking at the code that's not true. Maybe I intended to do it but never actually got around to doing it.

I'm 99% sure that none of the linux ARM wheels currently on PyPI are like this.

Wheel has bad defaults, we should fix them. There's no reason why a generic linux wheel is an unreasonable thing, the only real problem is that people will get it by default with opt-ing into it. Wheel should produce a hyper specific wheel by default, and require opting into the "generic" linux tags just like it does for manylinux. Beyond that, if people opt into a generic wheel but produce a specific wheel, that's a packaging bug on that package, the same as if they did it with a manylinux wheel.

ncoghlan commented 6 years ago

There aren't any hyper-specific Linux distro compatibility tags currently defined for wheel to opt in to: there's only "operating-system + CPU architecture".

We can mostly get away with that on Windows because the python.org binary provides an anchor as to which versions of the Windows ABI should be supported, but there's a reason the auditwheel project is a key piece of making manylinux workable in practice.

Defining a more selective tag format for Linux is indeed what Nate Coraor from the Galaxy Project initially proposed, and it's a good idea, it's just a question of whether anyone is sufficiently motivated to write up an interoperability spec for how they would work in practice.

vielmetti commented 5 years ago

Referred here by @notafile from https://github.com/meshy/pythonwheels/issues/109 and https://github.com/WorksOnArm/cluster/issues/116

I'm interested in the process and practice of getting a full set of arm64 wheels, targeting Linux, and able to run on at minimum in the Debian/Ubuntu, Fedora/CentOS/RHEL/SUSE, and Alpine universes. This is motivated by @bennuttall 's piwheels project, but that project only targets 32-bit systems, and by @notafile 's experience having a complex Python application take way too long to build because it requires a lot of dependencies to be built from scratch.

mattip commented 5 years ago

@vielmetti so kind of a multilinux2010 for arm64, based on a description of minimum os requirements?

vielmetti commented 5 years ago

@mattip - Yes, essentially that. Having read PEP-0571, I might even suggest to base it on CentOS 7, which was announced for aarch64/arm64 in 2015. CentOS 6 is EOL in 2020, and if we do a new PEP based on CentOS 7 we could potentially pave the way for all of their supported architectures including not only aarch64/arm64 but also ARMv7hl. CentOS 7 is EOL in 2024.

Having read this I am pretty sure that Alpine support will be difficult as they use musl instead of glibc as their default system library, but I see that as orthogonal to the arm64 question.

bennuttall commented 5 years ago

piwheels is an open-source project which builds wheels for any platforms. Anyone can run their own instance. Feel free: https://github.com/bennuttall/piwheels/ https://piwheels.readthedocs.io/en/latest/

piwheels.org is the Raspberry Pi repository, an instance of piwheels.

NotAFile commented 5 years ago

It seems that, with the CentOS route, there's no other way than to declare a new manylinux2015 platform tag, since centos7 is the earliest version to support ARM(64). I don't know how much interest there would be in this, considering manylinux2010 is only just landing right now. Maybe it'd be even easier to just tack it onto the current manylinux2010 work?

ncoghlan commented 5 years ago

Part of what took so long with manylinux2010 is that a lot of projects had to figure out what it even meant to have more than one manylinux variant to choose from. Now that that has been done once, we're hoping that future additions will be easier.

The switch to CalVer in the naming scheme was also primarily about making it easier to define newer baselines for non-x86[_64] architectures.

njsmith commented 5 years ago

The manylinux_glibc_${glibc_version}_${arch} proposal discussed here might be the simplest way forward: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4HKX6PVAS76EQNI7JNTGZZRHQ6SQ/

ncoghlan commented 5 years ago

Folks may find the summary post at https://mail.python.org/archives/list/distutils-sig@python.org/message/BFHBWB7ZO3L55V5JXGKN3FBNQQUN3END/ useful (it's my reply to the thread @njsmith linked).

And yeah, I agree - having the next manylinux PEP switch us over to that approach and get us to a point where installers can infer the right heuristic to use from the wheel name would be very helpful in making future manylinux updates easier.

njsmith commented 5 years ago

It will make future manylinux updates easier, but the reason I linked it here is because it will also make manylinux well-defined on every architecture. (Of course for each architecture someone will still have to do the work to figure out how to build portable wheels, but once you do then warehouse and pip will work immediately.)

ncoghlan commented 5 years ago

Well, mostly - each new libc would still need an algorithm defined for installers to extract the relevant version information. (I don't recall the exact details of what I found when I dug into possible musl support, but it was only similar to what we do for glibc, rather than being completely identical).

di commented 5 years ago

Given that PEP 599/manylinux2014 is accepted will allow manylinux_armv7l wheels, I think we should find a way to deprecate these wheel platforms and eventually block uploads once https://github.com/pypa/manylinux/issues/338 is complete.

This is a bit complicated by the fact that twine has no way to print a warning message on upload success, only failure, so this may require some creativity. For example, PyPI could allow the upload to succeed, but return an error response with a message indicating that while the upload actually succeeded, it will eventually fail. This would also cause twine to error with an exit code, which would make this issue apparent to anyone building/distributing wheels in CI or similar automated systems.

brainwane commented 4 years ago

Given that PEP 599/manylinux2014 is accepted will allow manylinux_armv7l wheels, I think we should find a way to deprecate these wheel platforms and eventually block uploads once pypa/manylinux#338 is complete.

@di is pypa/manylinux#338 complete enough that we should start following up on this?

di commented 4 years ago

Yes, I think so. We could probably do this along with #6792.

qlyoung commented 4 years ago

I have a (maybe dumb) question on this issue. I apologize if it ends up being off-topic due to a misunderstanding on my part; working with the Python packaging ecosystem for platform-specific wheels has been a difficult experience for me and I'm quite sure I still don't fully understand it.

As I read it, the intent of this issue is to ultimately block the uploading of linux_armv7l and linux_armv6l builds to PyPI.

I upload manylinux2014 wheels for a package. Today I got around to creating manylinux2014_armv7l builds. I built the package on an armv7l Debian 10 box (not a Pi), and used auditwheel to create a manylinux2014_armv7l wheel from linux_armv7l. I then uploaded this manylinux2014_armv7l wheel to PyPI - which was accepted - and then attempted to install it on the same build box using pip.

pip told me there were no compatible wheels for the platform. Modifying pip to print its supported platform tags, I got:

Click to expand list ``` [('cp37', 'cp37m', 'linux_armv7l'), ('cp37', 'abi3', 'linux_armv7l'), ('cp37', 'none', 'linux_armv7l'), ('cp36', 'abi3', 'linux_armv7l'), ('cp35', 'abi3', 'linux_armv7l'), ('cp34', 'abi3', 'linux_armv7l'), ('cp33', 'abi3', 'linux_armv7l'), ('cp32', 'abi3', 'linux_armv7l'), ('py3', 'none', 'linux_armv7l'), ('cp37', 'none', 'any'), ('cp3', 'none', 'any'), ('py37', 'none', 'any'), ('py3', 'none', 'any'), ('py36', 'none', 'any'), ('py35', 'none', 'any'), ('py34', 'none', 'any'), ('py33', 'none', 'any'), ('py32', 'none', 'any'), ('py31', 'none', 'any'), ('py30', 'none', 'any')] ```

Notably absent are any manylinux tags, which explains why pip determined there were no available tags. At this point in time, pip was the version shipped in Debian 10's python3-pip package:

pip 18.1 from /usr/lib/python3/dist-packages/pip (python 3.7)

After updating pip to the latest version with python3 -m pip install --upgrade pip, the installed pip version is pip 20.0.2 from /usr/local/lib/python3.7/dist-packages/pip (python 3.7). Doing the same, I got:

Click to expand list ``` [, , , , , , , , , , , , , , , , , , , , , , , , , , ] ```

Again, no manylinux* tags.

Finally, taking a look at the compatible tags from pip on my x86_64 Ubuntu 18.04 machine:

Click to expand list ``` [, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ] ```

All the manylinux tags are available. pip on this box is: pip 20.0.2 from <my local user install>.

So my conclusion here is that, today, pip actually doesn't support manylinux* tags on at least armv7l.

My question is this: is there a plan to rectify this prior to blocking uploads for linux_armv7l wheels to PyPI? Presently, that tag is the only usable one, at least on Debian 10.

di commented 4 years ago

@qlyoung Your armv7l box might just not actually be manylinux compatible. If you do:

pip install -U packaging

and then

python -c "import packaging.tags; print(packaging.tags._glibc_version_string())"

what do you get?

qlyoung commented 4 years ago
...
Successfully installed packaging-20.3 pyparsing-2.4.7
root@host# python3 -c "import packaging.tags; print(packaging.tags._glibc_version_string())"
2.28

If there are armv7l boxes that aren't manylinux compatible, doesn't that also indicate the necessity of keeping linux_arm*l tags around? And shouldn't auditwheel have prevented me from creating a manylinux wheel in the first place? I definitely recall seeing warnings about libc recency other platforms when doing builds outside the manylinux* containers.

di commented 4 years ago

That's interesting, I would expect that version of glibc to give you manylinux2014 support. What's the output of python -c "import packaging.tags; print(list(packaging.tags.sys_tags()))"?

qlyoung commented 4 years ago

I posted the equivalent output in my OP, nice to know what the canonical way is :slightly_smiling_face: As before:

root@host# python3 -c "import packaging.tags; import pprint; pprint.pprint(list(packaging.tags.sys_tags()))"
[<cp37-cp37m-linux_armv7l @ 3059070016>,
 <cp37-abi3-linux_armv7l @ 3059070136>,
 <cp37-none-linux_armv7l @ 3059070216>,
 <cp36-abi3-linux_armv7l @ 3059070296>,
 <cp35-abi3-linux_armv7l @ 3059070376>,
 <cp34-abi3-linux_armv7l @ 3059070456>,
 <cp33-abi3-linux_armv7l @ 3059070536>,
 <cp32-abi3-linux_armv7l @ 3059070616>,
 <py37-none-linux_armv7l @ 3059070776>,
 <py3-none-linux_armv7l @ 3059070696>,
 <py36-none-linux_armv7l @ 3059070896>,
 <py35-none-linux_armv7l @ 3059070976>,
 <py34-none-linux_armv7l @ 3059071056>,
 <py33-none-linux_armv7l @ 3059071136>,
 <py32-none-linux_armv7l @ 3059071216>,
 <py31-none-linux_armv7l @ 3059071296>,
 <py30-none-linux_armv7l @ 3059071376>,
 <py37-none-any @ 3059071456>,
 <py3-none-any @ 3059071496>,
 <py36-none-any @ 3059071536>,
 <py35-none-any @ 3059071576>,
 <py34-none-any @ 3059071616>,
 <py33-none-any @ 3059071656>,
 <py32-none-any @ 3059071696>,
 <py31-none-any @ 3059071736>,
 <py30-none-any @ 3059071776>]
di commented 4 years ago

Got it, so I'm guessing python -c "import packaging.tags; print(packaging.tags._is_linux_armhf())" is false then, right? The issue is that armv7l overlaps multiple ABIs, so we chose armhf as the representative ABI for armv71. We don't have the ability to support multiple ABIs for a given architecture.

njsmith commented 4 years ago

@qlyoung

If there are armv7l boxes that aren't manylinux compatible, doesn't that also indicate the necessity of keeping linux_arm*l tags around?

The problem with the linux_arm*l tags is that they don't tell you anything at all about compatibility. All it says is "this might work on some box somewhere, but no-one knows whether it will work on your box". That's not really better than manylinux :-)

@di

The issue is that armv7l overlaps multiple ABIs, so we chose armhf as the representative ABI for armv71.

If there are multiple ARM ABIs in active use, then I guess it wouldn't be complicated for pip/pypi to support multiple of them, as long as we can easily sniff out which ABI we're running under?

qlyoung commented 4 years ago

This is indeed an armel machine.

# python3 -c "import packaging.tags; print(packaging.tags._is_linux_armhf())"
False

we chose armhf as the representative ABI for armv71. We don't have the ability to support multiple ABIs for a given architecture.

I see. In this case, as I understand it, there is no real support today for armel wheels since armv7l wheels are assumed to be armhf wheels; and furthermore, this information isn't encoded in the wheel tag, but rather in the packaging libraries as a soft check. I'm not even sure if armel wheels are forward compatible with armhf (not intricately familiar with the abi's) but if they aren't, it sounds like I just uploaded a wheel that pip will happily install but might break on an ABI that, ostensibly, its tag says (by omission) it works on?

Also, for my own edification - are there docs on details like this w.r.t Python packaging, and if so, where? I've read through 425 and the manylinux PEP's, wondering if there is something else I should look over.

The problem with the linux_arm*l tags is that they don't tell you anything at all about compatibility.

Quite right, the arch + abi scene in ARM land always confuses me.

If there are multiple ARM ABIs in active use, then I guess it wouldn't be complicated for pip/pypi to support multiple of them, as long as we can easily sniff out which ABI we're running under?

That would be awesome. To provide a bit of background, the box in question is a datacenter-grade network switch; quite a few widely deployed models from various vendors use armel.

di commented 4 years ago

There are currently 105 projects that publish linuxarmv6l or linuxarmv7l wheels:

warehouse=> SELECT 
  count(*) 
FROM 
  (
    SELECT 
      roles.user_id as user_id, 
      roles.project_id as project_id 
    FROM 
      (
        SELECT 
          project_id 
        FROM 
          (
            SELECT 
              release_id, 
              packagetype 
            FROM 
              release_files 
            WHERE 
              (
                packagetype = 'bdist_wheel' 
                AND filename like '%linux_armv%l.whl'
              ) 
            GROUP BY 
              release_id, 
              packagetype
          ) f 
          JOIN releases ON releases.id = f.release_id 
        GROUP BY 
          project_id
      ) release 
      JOIN roles ON release.project_id = roles.project_id 
    GROUP BY 
      user_id, 
      roles.project_id
  ) p1 
  JOIN projects ON p1.project_id = projects.id;
 count
-------
   105
(1 row)

I'm planning to email all maintainers/owners of these projects to announce a 6-month deprecation period for these distribution types, and provide guidance for migrating to manylinux2014.

uranusjr commented 3 years ago

Hi, is there a follow-up on this?

di commented 3 years ago

I'm planning to email all maintainers/owners of these projects to announce a 6-month deprecation period for these distribution types

Well, that didn't happen

and provide guidance for migrating to manylinux2014.

Mostly because the specs here were in flux and such a guide doesn't exactly exist.

First step towards unblocking this would be to draft an email to these users, and determining how we should guide them towards the new standards.