pypa / packaging-problems

An issue tracker for the problems in packaging
149 stars 34 forks source link

Campaign to get people Publishing Wheels #25

Open dstufft opened 10 years ago

dstufft commented 10 years ago

How can we get more people to publish Wheels, especially for Windows? Christoph Gohlke published Windows installers but that won't work for Wheel because he won't have rights to upload them.

Perhaps the Build farm I've wanted to do can be used here?

alex commented 10 years ago

http://pythonwheels.com/ is an attempt at this, now that pip 1.5 installs them by default this should be easier.

I think one part of this would be to make the setup.py ... package uploading process more streamlined and do the right thing.

kura commented 10 years ago

I was thinking surely it makes sense for a simple "build" command that made wheels, eggs and an sdist by default? Rather than having to specify each one separately?

Am I wrong in thinking that you still need to install another package just to create wheels?

alex commented 10 years ago

Yes, you need to pip install wheel before setup.py bdist_wheel works. Also, you really shouldn't be making eggs ;)

hickford commented 9 years ago

As of 2015 Christoph Gohlke publishes wheels rather than msi installers http://www.lfd.uci.edu/~gohlke/pythonlibs/

brainwane commented 5 years ago

@scopatz is this something you could comment on?

scopatz commented 5 years ago

Thanks for roping me into this issue @brainwane.

I am speaking a on behalf of conda-forge here. But basically, we'd love it if conda-forge could be used to build & publish wheels. To that end, it might be more useful to think of conda-forge as just "The Forge."

We have the infrastructure for building binary packages across Linux, OS X, Windows, ARM, and Power8 already. We have a tool called conda-smithy that we develop and maintain that helps us keep all of the packages / recipes / CIs configured and up-to-date.

I see two major hurdles to building and deploying wheels from conda-forge. These could be worked on in parallel.

Building: conda-smithy would need to be updated so that packages that are configured to do would generate the approriate CI scripts (from Jinja templates) to build wheels. This would be CI-provider and architecture specific. Probably the easiest place to start is building from manylinux on Azure. We would probably need at least one configuration variable to live in conda-forge.yml that actively enables wheel building (enable_wheels: true? enable_wheels: {linux-64: true}?). Conda-smithy reads this file when it rerenders a feedstock (a git repo with a specific structure for building packages). There are probably some subtleties and difficulties here with working through which compiler toolchains should be used on different platforms (there is really only the manylinux standard for linux). But this is the basic idea.

The challenge with building is that most of the conda-forge people are not used to building wheels. I am happy to help work on the conda-forge infrastructure side, but I think we need someone who is an expert on the wheels side who is also willing to jump in and help scale this out with me.

Deploying: Once we can build wheels, we need a place to put them. Nominally, this would be PyPI. But we need to be able to do this from a CI service. We are happy to have an authentication token that we use. There isn't much that I see that conda-forge can really do about this (which has prevented us from working on this issue previously). However, I think that the PyPI is working on this.

I am super excited about this; the fundemental premis of conda-forge is to be open source, cross platform, community build infrastructure. If there are other folks out there who are enthusiatsic about getting this working, please reach out to me or put me touch!

brainwane commented 5 years ago

Thanks @scopatz! @waveform80 and @bennuttall would you like to speak from the piwheels perspective? And @jwodder, from what you have learned via Wheelodex? (Found out about you via this thread.)

astrojuanlu commented 5 years ago

Perhaps the work that @Matthew-Brett did at MacPython to build wheels of key packages of the Scientific Python stack will be helpful as well. Also, I discovered cibuildwheel by @joerick recently. (Edit: wrong Matthew Brett)

bennuttall commented 5 years ago

For the piwheels project we build arm platform wheels for the Raspberry Pi, built natively on Raspberry Pi hardware, on piwheels.org we don't try to bundle dependencies ala manylinux2010, instead we target what's stable in the distro (Raspbian) and make no promises elsewhere. The project source itself is open, so others could run their own repos targeting other platforms.

I don't recommend maintainers upload arm wheels, and instead let us build them knowing they work on the Pi.

We also attempt to show library dependencies on our project pages e.g. https://www.piwheels.org/project/numpy/ rather than let people work them out e.g. https://blog.piwheels.org/how-to-work-out-the-missing-dependencies-for-a-python-package/

mingwandroid commented 5 years ago

Hi @scopatz, what do you propose to do about shared libraries that have no natural place in a wheel (to me, most shared libraries have no natural place in a wheel).

We cannot stick our heads in the sand on that. That we use shared libraries heavily in conda is one of our most compelling advantanges and because we use the same ones across languages, putting those shared libraries in a wheel would be a bad thing to do.

I'm not coming with a solution here. I wish I were, I really do.

matthew-brett commented 5 years ago

It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.

I can see the problem is more urgent for Conda, because y'all are building a multi-language software distribution.

A few years ago, @njsmith wrote a spec for pip-installable libaries: https://github.com/pypa/wheel-builders/pull/2

It isn't merged, and it looks like 'the current setup works for me' has meant that no-one thus far has had the time or energy to work further on that. I suspect something on that line is the proper solution, if we could muster the time.

matthew-brett commented 5 years ago

By the way - @scopatz - I'm happy to help integrating the wheel builds into conda forge - but I'm crazy busy these days, so I won't have much time for heavy lifting.

mingwandroid commented 5 years ago

It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.

Well, the software needs to work of course and I'm not being facetious!

We end up discussing where the line is between the thing itself and the system libraries that support it, and that's not clear cut. Take xgboost as an example. It has a C/C++ library and bindings for Python and R. Now xgboost itself builds static libs for each so they sidestepped that issue while we're much more efficient (n many dimensions). Now libxgboost is clearly a part of the xgboost stack, but what about ncurses? Is it system or not? In conda-forge, we provide it, and in all honesty that line is organic and something we move as and when we find we need to.

pradyunsg commented 5 years ago

@brainwane @scopatz if there's a better title for this issue today, could you change it/comment so that someone else who can make the change, changes it?

snakescott commented 5 years ago

I can offer mild packaging familiarity, reasonable python / CI / cloud experience and say 10-20 hours a week for the next month if it would be helpful. I think I would be a good fit if there's a rough consensus on direction and pypa/conda experts available for consulting but bottlenecked on elbow grease

cc @brettcannon @dstufft @asottile

mikofski commented 5 years ago

@matthew-brett I thought Carl Kleffner did something similar to a pip installed tool chain with openBLAS for NumPy though my memory might be foggy

matthew-brett commented 5 years ago

@mikofski - right - Carl was working on Mingwpy, which was (still is) a pip-installable gcc compiler chain to build Python extensions that link against the Python.org Microsoft Visual C++ runtime library.

Work has stalled on that, for a variety of reasons, although I still think it would be enormously useful. I can go into more details - or - @carlkl - do you want to give an update here?

@mattip - because we were discussing this a couple of weeks ago.

teoliphant commented 5 years ago

It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.

I can see the problem is more urgent for Conda, because y'all are building a multi-language software distribution.

A few years ago, @njsmith wrote a spec for pip-installable libaries: pypa/wheel-builders#2

It isn't merged, and it looks like 'the current setup works for me' has meant that no-one thus far has had the time or energy to work further on that. I suspect something on that line is the proper solution, if we could muster the time.

I don't know if we have a clear answer that pip should be used as a general-purpose packaging solution. My view which seems to be shared by several others from the recent discourse discussion about it is that it should not try to "reinvent the wheel" or replace general purpose packaging solutions (like conda, yum, apt-get, nix, brew, spack, etc...), pip has clear use as a packaging tool for developers and "self-integrators".

For that use case, statically linking dependencies into a wheel (vendoring native dependencies) can be a stop-gap measure but become very difficult for distributors as evidenced by pytorch, rapids, arrow, and other communities. It is definitely not ideal and in-fact a growing problem with promoting the use of wheels for all Python users.

Using pip to package native libraries is conceivably possible, but a bigger challenge than it seems at first. It is hard to understand the motivation for this considerable work when this problem is already solved by several other open-source and more general-purpose packaging systems.

A better approach in my view is to enable native-library requirements to be satisfied by external packaging systems. In this way, pip can allow other package managers to install native requirements and only install wheels with native requirements if they are already present.

Non-developer, end-users who use Python integrated with many other libraries (such as the PyData and SciPy users) should be also be encouraged to use their distribution package manager to get their software. These distributions (such as conda-forge) already satisfy robustly the need for one-command installation. This is a better user-experience than encouraging these particular users to "pip install"

In sum: conda-forge infrastructure producing wheels is a good idea, conda-build recipes producing wheels that allow for conda-packages to satisfy native-library dependencies is an even better idea.

pfmoore commented 5 years ago

@teoliphant While theoretically a reasonable idea, this ignores the fact that a significant number of users are asking for pip-installable versions of these packages. Ignoring those users, or suggesting that they should "just" switch to another packaging solution, is dismissing a genuine use case without sufficient investigation.

I know from personal experience that there are people who do need such packages but who can't or won't switch to Conda (for example). And on Windows there is no OS-level distribution package manager. How do we serve such users?

msarahan commented 5 years ago

From talks at SciPy, it seemed like a good answer for those users is to provide "fat" wheels that ship all needed shared libraries with the wheel. These could be created using conda packages to minimize build time and consolidate build procedures. There was some experimentation with that using numpy and scikit-image as tests. The packages were significantly larger - probably too large. Static linking is much more efficient, but bifurcates the build process. I'm hopeful that we can explore ways to trim down the shared library size such that this approach may be viable. Having any sort of scheme to actually share native libraries via wheels (pynativelib) would help, but I think a strong dependency solver is a hard requirement for implemention of that.

On Sun, Jul 14, 2019, 02:12 Paul Moore notifications@github.com wrote:

@teoliphant https://github.com/teoliphant While theoretically a reasonable idea, this ignores the fact that a significant number of users are asking for pip-installable versions of these packages. Ignoring those users, or suggesting that they should "just" switch to another packaging solution, is dismissing a genuine use case without sufficient investigation.

I know from personal experience that there are people who do need such packages but who can't or won't switch to Conda (for example). And on Windows there is no OS-level distribution package manager. How do we serve such users?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pypa/packaging-problems/issues/25?email_source=notifications&email_token=AAAJL6NJO3YZ5A7ES5QXNTLP7LGV7A5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ37VFQ#issuecomment-511179414, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJL6JMU67DXS6DDS5JRGTP7LGV7ANCNFSM4AJKEUUQ .

mingwandroid commented 5 years ago

What about SONAME though? Or are you proposing to rewrite them and rename the DSOs? If so are we worried about passing objects between different versions of the same library? The glibc folks warned manylinux about that.

On Sun, Jul 14, 2019, 5:22 PM Mike Sarahan notifications@github.com wrote:

From talks at SciPy, it seemed like a good answer for those users is to provide "fat" wheels that ship all needed shared libraries with the wheel. These could be created using conda packages to minimize build time and consolidate build procedures. There was some experimentation with that using numpy and scikit-image as tests. The packages were significantly larger - probably too large. Static linking is much more efficient, but bifurcates the build process. I'm hopeful that we can explore ways to trim down the shared library size such that this approach may be viable. Having any sort of scheme to actually share native libraries via wheels (pynativelib) would help, but I think a strong dependency solver is a hard requirement for implemention of that.

On Sun, Jul 14, 2019, 02:12 Paul Moore notifications@github.com wrote:

@teoliphant https://github.com/teoliphant While theoretically a reasonable idea, this ignores the fact that a significant number of users are asking for pip-installable versions of these packages. Ignoring those users, or suggesting that they should "just" switch to another packaging solution, is dismissing a genuine use case without sufficient investigation.

I know from personal experience that there are people who do need such packages but who can't or won't switch to Conda (for example). And on Windows there is no OS-level distribution package manager. How do we serve such users?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/pypa/packaging-problems/issues/25?email_source=notifications&email_token=AAAJL6NJO3YZ5A7ES5QXNTLP7LGV7A5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ37VFQ#issuecomment-511179414 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAAJL6JMU67DXS6DDS5JRGTP7LGV7ANCNFSM4AJKEUUQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pypa/packaging-problems/issues/25?email_source=notifications&email_token=AAH6S5FZX5IHXLV44YP73HTP7NADRA5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ4HQIY#issuecomment-511211555, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH6S5G3OKHVP7DGZGKBVXTP7NADRANCNFSM4AJKEUUQ .

matthew-brett commented 5 years ago

pip has clear use as a packaging tool for developers and "self-integrators".

I guess it is used by those people, but it's used by a lot of other people too.

For that use case, statically linking dependencies into a wheel (vendoring native dependencies) can be a stop-gap measure but become very difficult for distributors as evidenced by pytorch, rapids, arrow, and other communities. It is definitely not ideal and in-fact a growing problem with promoting the use of wheels for all Python users.

I guess the problem is growing, but only in the sense that there are an increasing number of packages that ship wheels now. There are some difficult packages - I know that the GUI packages can have trouble. What difficulties are pytorch, rapids, arrow having? I'm happy to advise.

Using pip to package native libraries is conceivably possible, but a bigger challenge than it seems at first. It is hard to understand the motivation for this considerable work when this problem is already solved by several other open-source and more general-purpose packaging systems.

A better approach in my view is to enable native-library requirements to be satisfied by external packaging systems. In this way, pip can allow other package managers to install native requirements and only install wheels with native requirements if they are already present.

I think that's exactly the problem - it's not practical for a Python package to try and work with the huge numbers of package variants that it could encounter.

Non-developer, end-users who use Python integrated with many other libraries (such as the PyData and SciPy users) should be also be encouraged to use their distribution package manager to get their software. These distributions (such as conda-forge) already satisfy robustly the need for one-command installation. This is a better user-experience than encouraging these particular users to "pip install"

I don't think Scipy or PyData users will have any trouble - were you thinking of any package in particular? Numpy / Scipy / Matplotlib / Pandas are all well packaged, and have been for a long time.

In sum: conda-forge infrastructure producing wheels is a good idea, conda-build recipes producing wheels that allow for conda-packages to satisfy native-library dependencies is an even better idea.

I don't think there's much appetite for making pip installs depend on prior conda installs - wouldn't that just increase the confusion?

astrojuanlu commented 5 years ago

What difficulties are pytorch, rapids, arrow having? I'm happy to advise.

For arrow, I think it's best summarized here:

https://twitter.com/wesmckinn/status/1149319821273784323

  • many C++ dependencies
  • several bundled shared libraries
  • some libraries statically linked
  • privately namespaced, bundled version of Boost
matthew-brett commented 5 years ago

@wesm - I'm happy to help with this - let me know if I can. Did you already contact the scikit-build folks? I have the impression they are best for C++ chains. (Sorry, I can't reply on Twitter, have no account).

wesm commented 5 years ago

I believe we have one of the most complex package builds in the whole Python ecosystem. I think TensorFlow or PyTorch might have us beat, but it's close (it's obviously not a competition =D).

I haven't contacted the scikit-build folks yet, if that could help us simplify our Python build I'm quite interested. I'm personally all out of budget for this after I lost a third or more of my June to build and package-related issues so maybe someone else can look into it

cc @pitrou @xhochy @kszucs @nealrichardson

matthew-brett commented 5 years ago

Thanks - that sounds very tiring. I bet we can use this as a stimulus to improve the tooling. Would you mind making an issue in some sensible place in the Arrow repositories for us to continue the discussion?

pitrou commented 5 years ago

I'll echo what @wesm said here. I spent a lot of time as well trying to cope with wheel packaging issues on PyArrow. I'd be much happier if people accepted to settle on conda for distribution and installation of compiled Python packages.

(disclaimer: I used to work for Anaconda but don't anymore. Also I own a very small amount of company shares)

matthew-brett commented 5 years ago

@pitrou - I hear the hope, but I really doubt that's going to happen in the short term. So I still think the best way, for now, is for those of us with some interest and time, to try and improve the wheel building machinery to the point where they are a minimal drain on your development resources.

wesm commented 5 years ago

Just to drop some statistics to indicate the seriousness of this problem, our download numbers are growing to the same magnitude as NumPy and pandas

$ pypistats overall pyarrow
|    category     | percent | downloads  |
|-----------------|--------:|-----------:|
| with_mirrors    |  50.18% |  9,700,974 |
| without_mirrors |  49.82% |  9,630,781 |
| Total           |         | 19,331,755 |

$ pypistats overall numpy
|    category     | percent |  downloads  |
|-----------------|--------:|------------:|
| with_mirrors    |  50.15% | 114,356,740 |
| without_mirrors |  49.85% | 113,661,813 |
| Total           |         | 228,018,553 |

$ pypistats overall pandas
|    category     | percent |  downloads  |
|-----------------|--------:|------------:|
| with_mirrors    |  50.12% |  67,694,077 |
| without_mirrors |  49.88% |  67,358,042 |
| Total           |         | 135,052,119 |

One of the reasons for our complex build environment is that we're solving problems that are very difficult or impossible to solve without a deep dependency stack. So there is no end in sight to our suffering with the current state of wheels

mingwandroid commented 5 years ago

Did you already contact the scikit-build folks? I have the impression they are best for C++ chains

I believe conda is the best for C++ chains but I would say that.

mattip commented 5 years ago

OpenCV seems to have a similar problem to pyarrow. They have many dependencies and are c++ based. The opencv-python repo builds wheels by using upstream opencv as a submodule. It uses scikit-build to build various variants of the package as a single c-extension module. Most of the dependencies are statically linked into the shared object. Maybe worth a look if you want to target wheels. The separation between the OpenCV repo and the packaging repo is particularly attractive since the CI runs and testing can be separate.

matthew-brett commented 5 years ago

I believe conda is the best for C++ chains but I would say that.

Yes, sorry, I don't have an informed opinion on Conda and C++ - I only meant 'best of the Wheel meta build tools, if you have C++ chains'.

mingwandroid commented 5 years ago

Why is everyone so keen to fill their computer memory up with mutliple copies of the same functions?

mingwandroid commented 5 years ago

We do tend to build out static libs for all of our packages, but the problem is that things need to be rebuilt twice, once using shared libs for conda and once using static libs for wheels.

Problem is you're then at a point where the difference in effort between building a wheel and a conda package may shrink (in the wrong direction).

astrojuanlu commented 5 years ago

Why is everyone so keen to fill their computer memory up with mutliple copies of the same functions?

I hope it's a sarcastic comment. What many people apparently want though it's to be able to pip install everything, which might be a legitimate request or not (no comment on that). The Jupyter folks already gave up on this (not familiar with the reasons though) and now the extensions must be built using npm, which of course you can't pip install (yet?).

mingwandroid commented 5 years ago

I hope it's a sarcastic comment

It is a rhetorical question, not a sarcastic comment. And it is entirely legitimate.

The thing that is being skirted around here is the fact that wheels are hard to build because they link to static libraries, static libraries are exteremely fiddly for build systems (and humans) to work with.

The work done to build a shared library is reused every time that shared library is loaded. The work done to build something with a static library is repeated with every package.

That work is also horribly complicated, if the same static lib gets linked twice into the same Python extension module you get symbol name clashes. Often you have to deal with static libs built with one build system being consumed by another and for that to work they need to understand, to some extent, the C-level packaging meta data from each other (so CMake needs to know some things about libtool and pkg-config for example).

I believe this is the crux of why building packages for conda is easy and building complex packages for PyPI is not and there's not a great amount of tooling that can be done around that problem.

matthew-brett commented 5 years ago

Why is everyone so keen to fill their computer memory up with mutliple copies of the same functions?

I don't know whether have any numbers on that - I don't - but I'm betting that the extra memory is maybe in the order of 30M in typical PyData usage, which might worry me on my first-generation Raspberry Pi,. not not on my Intel laptop.

mingwandroid commented 5 years ago

which might worry me on my first-generation Raspberry Pi,. not not on my Intel laptop.

Sorry @matthew-brett, I disagree with this. Software should scale well and be efficient everywhere, otherwise you're killing the planet (without getting too high up on my moral horse!)

matthew-brett commented 5 years ago

Yes, sorry, I'm only saying the 30M doesn't worry me, personally, on the Intel machines that I use; I get that it worries you, and I can see that would have an effect on your choice of packaging tool.

pfmoore commented 5 years ago

I'd be much happier if people accepted to settle on conda for distribution and installation of compiled Python packages.

I think the question of "conda vs wheels" keeps coming up, but it seems to me that it's a bad case of comparing apples and oranges. My personal problem with switching to conda is that it isn't just a choice of packaging tools - there's a lot of other baggage as well. (Disclaimer: I haven't tried using conda for a while now, although I made a few attempts in the past, so my data may be out of date). For example:

If conda offered just a package management solution, it would be much easier to argue that people wanting wheels for hard-to-package projects should just use conda. But when a switch to conda involves such a significant change to the user's whole working environment, it's a much harder sell (and it's much more important to take the comments of people saying "I can't use conda for my situation" seriously).

Further disclaimer: I am a pip maintainer, so the above is not unbiased. But I have heard similar comments from a colleague who has no particular reason to prefer either option.

msarahan commented 5 years ago

I'm sorry, I think I've gotten the discussion on the conda vs. pip tangent. I didn't mean to pick that scab.

I was not proposing that end users who currently want to use wheels switch to using pip. What I was proposing was that on the build-side, conda's foundation can simplify the build process, ultimately leading to wheels that people install without knowledge of them being produced by conda recipes.

Matthew, I'm sure you can work out a better build for Wes, and for anyone else, but I don't think that's a scalable solution. Updates that require cascading rebuilds ("migrations" in conda-forge lingo) will still be a nightmare for any system that is not based around a strong dependency solver. Maintainers' time is better spent coding interesting, useful software, not just connecting the dots on build systems.

Again, NOT conda vs. pip. Conda as a tool to help provide wheels for pip.

On Mon, Jul 15, 2019 at 1:01 PM Paul Moore notifications@github.com wrote:

I'd be much happier if people accepted to settle on conda for distribution and installation of compiled Python packages.

I think the question of "conda vs wheels" keeps coming up, but it seems to me that it's a bad case of comparing apples and oranges. My personal problem with switching to conda is that it isn't just a choice of packaging tools - there's a lot of other baggage as well. (Disclaimer: I haven't tried using conda for a while now, although I made a few attempts in the past, so my data may be out of date). For example:

  • Conda manages my Python interpreter, so it's not immediately obvious to me how I'd use a new python.org release (or a beta, or a personal build).
  • Conda has its own virtual environment solution, meaning I don't know how it interacts with things like pipenv, or pew, or pipx (or, for that matter, tox). And if I decide to try it anyway, I'm not at all clear who I'd look to for support if something didn't work as expected.
  • Conda builds are handled by people other than the upstream projects, so if the conda build people haven't packaged a project, or they haven't packaged the version I want to use, I'm back to needing "another solution" (at which point we're back to does pip integrate with conda, and if I have to use pip for some of my dependencies, why can't I just use pip for everything).

If conda offered just a package management solution, it would be much easier to argue that people wanting wheels for hard-to-package projects should just use conda. But when a switch to conda involves such a significant change to the user's whole working environment, it's a much harder sell (and it's much more important to take the comments of people saying "I can't use conda for my situation" seriously).

Further disclaimer: I am a pip maintainer, so the above is not unbiased. But I have heard similar comments from a colleague who has no particular reason to prefer either option.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pypa/packaging-problems/issues/25?email_source=notifications&email_token=AAAJL6PWVGQEU5XJYPH76VLP7S3OBA5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ6PVZA#issuecomment-511507172, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJL6MLOVPSVQMNXJQY7Q3P7S3OBANCNFSM4AJKEUUQ .

wesm commented 5 years ago

I agree with @msarahan -- the problem is that we are completely on our own to manage a standalone library toolchain that has to be shipped in a wheel using a mix of static linking, bundling shared libraries, and private namespacing.

The configuration of the wheel build is highly bespoke because of the requirement that the wheel be self-contained and as isolated as possible from the user's system. As an example, because we are now shipping gRPC symbols in our wheels (for our Arrow Flight messaging system), we now need to bundle or statically link OpenSSL to provide TLS support to users. I have no idea what will happen when users are using Google's Python gRPC wheels in the same process.

We are also statically linking:

When I tell people about the dependency stack, the knee-jerk reaction is "But do you really need $LIBRARY...?" We aren't adding these dependencies frivolously -- they are used to solve real world problems.

Just building our wheels and getting them to work everywhere is hard enough, but we've also had to contend with the non-compliant wheels from PyTorch and TensorFlow that result in numerous in-bound bug reports

Were it not for our generous corporate sponsorships, keeping this project going from an operational standpoint probably would not be possible. But we are spending a disproportionate amount of time maintaining packages (wheels being by far and away the worst offender), causing said sponsorships to be at risk in the medium term if we aren't able to spend more time focused on building new features.

matthew-brett commented 5 years ago

@wesm - I absolutely see the problem - and I hope very much we can remove a large proportion of that pain from your build process, by some collaboration on improving tooling. I'm not saying that will definitely work, only that we should definitely try.

That said, you do have a problem of nearly unique complexity (competing in this respect with tensorflow). It is true that, if all packages were in your situation, the current vendor-your-libraries situation in wheels would make wheels in general unworkable.

On the other hand, wheels do work, with a small and acceptable maintenance burden, in many cases.

Those of us who want to continue to use pip have a strong motivation for trying to help you out of the hole that that the current system has put you in.

pitrou commented 5 years ago

I think the question of "conda vs wheels" keeps coming up, but it seems to me that it's a bad case of comparing apples and oranges.

I don't want to be insulting, but it's rather comparing good apples and rotten oranges. The people complaining are people who have had a forced long, painful experience building wheels. The people saying everything is fine are people who seem to only be building simple, trivial packages.

Personally, I am extremely annoyed by the self-complacency of the so-called Python Packaging "Authority". You want to call yourself an Authority but you don't seem to have the Competence that goes with it. That's a fundamental issue here. @ncoghlan

matthew-brett commented 5 years ago

The people saying everything is fine are people who seem to only be building simple, trivial packages.

Building simple, trivial packages, is pretty simple, and trivial.

Building moderately complicated packages, like Scripy, Matplotlib and Pillow, is really not too bad.

Building complicated packages like VTK, ITK, was hard work. I didn't help with those, and I don't know how much of an ongoing burden that is.

I think we do all agree that Arrow and Tensorflow are at the very extreme end of difficulty. So we have a fairly typical situation where a tool works pretty well for the large majority of packages, but is very hard to use for the most difficult packages.

pitrou commented 5 years ago

But conda works equally well or better in all cases... so? Why do we have to cope with an inferior standard? Just so that the wheel designers don't lose face?

wesm commented 5 years ago

We certainly have the option to give up on wheels (and at this point, I would say good riddance). The problem is that the users will accuse us of being lazy instead of asking whether wheels are the right place to deploy the software.

pitrou commented 5 years ago

Exactly, that's the problem. And that's why I think "Campaign to get people Publishing Wheels" is a nuisance to packagers :-/

matthew-brett commented 5 years ago

@pitrou - you're saying "please give up using pip and use conda instead", but I honestly think there isn't any appetite for that discussion here. Happy to be corrected, but if so, let's move that off to a different place. Do you want to do that? I'm happy to join that, wherever it may be. Assuming / hoping that we aren't having that discussion, then the question is, what can we do to make Arrow's build process practical. I can well see that using conda-forge's static libraries could be a good solution.

pitrou commented 5 years ago

Happy to be corrected, but if so, let's move that off to a different place. Do you want to do that?

There is no need for a discussion. All the arguments have been given. The remaining TODO item is for the Python Packaging "Authority" to stop recommending building wheels (see issue title) and recommend conda instead.