Automatic OS dependency error catching with interactive prompt

ghost commented 10 years ago

Originally reported by: chrstphrhrt (Bitbucket: chrstphrhrt, GitHub: chrstphrhrt)

Abstract: I think it would be useful to further assist users who are trying to install Python packages which depend on libraries that are otherwise superfluous to the experience of using the project being installed.

It's normal. Everyone who ever needs to install Python packages on fresh machines will quickly learn all the common OS-level dependencies anyway. However, if it's an abnormally large or new package being installed by a savvy user, there's still a significant risk of surprise and monotonous troubleshooting.

Often the traceback from setuptools includes a message like 'foo.h is missing', where foo (or its pathname) has nothing obviously in common with the OS libfoo or foo-dev package name that is required.

Any helpful solution would not be beautiful in the sense that it's just a mapping of arbitrary names for each OS. However, once the most common mappings are catalogued, it will still be helpful to a lot of people.

Does setuptools have any hooks that wrapper packages could use to catch the dependency errors and call a retry after the missing dependencies are satisfied?

Here's the relevant thread from #python for fun:

#!irc

chrstphrhrt: so I just went through installing a complicated python application on linux again, and every time the setuptools script dies because of some missing header files I need to find and install the relevant libraries.. that’s all fine and well, but it itches, and I wonder, is there anything that can wrap a setuptools program and catch those failures to allow for interactive responses from the user?

therealfakemoot: chrstphrhrt: and what response would you have to it, other than "okay hold on until I install the right package"?

therealfakemoot: chrstphrhrt: You could integrate it into the package manager (in theory) but that'd tie it to a specific os

chrstphrhrt: therealfakemoot: something which would talk to apt

chrstphrhrt: yeah

KirkMcDonald: chrstphrhrt: Does your package manager have this application in it?

yeukhon: also note usually people do sudo apt-get installl whereas u can do pip install pack —user

therealfakemoot: chrstphrhrt: then there's the issue of tying the possibly generic os-agnostic error messages you get from building, and correlating them to the *correct* apt package

chrstphrhrt: yeah no problems getting things installed… just an idea to make it easier for people who don’t know how to search and install

yeukhon: i think this will be a good python-ideas. i think if you are library writer, it's better to provide a good README chrstphrhrt

chrstphrhrt: yeukhon: cool thanks i’ll mull it over

therealfakemoot: yeah. it's better to provide clear documentation of your dependencies

therealfakemoot: if they cannot be included in your package

therealfakemoot: than to hope that you can guess which apt package is appropriate for a random pypi package

chrstphrhrt: therealfakemoot: not even thinking as a python library author, just as a sysadmin who would like to automate that with arbitrary python packages

yeukhon: i think the issue is where to find these executables or dependency

yeukhon: before we actually install the package. like a sane dep check

therealfakemoot: chrstphrhrt: I'm just making the point that there is an information gap that you'd have to fill with -your code-

yeukhon: chrstphrhrt:  but i personally think when pip fails, the log is verbose and would be really nice if there is a way to extract the missing dep. but it's not really obvious. for example many lib requires python-dev to be installed

chrstphrhrt: therealfakemoot: but i guess there could be some entry point for library authors to expose? i’d prefer to just provide a utility that wraps setuptools or pip rather than create a reverse dependency

yeukhon: and it will fail with Python.h missing

therealfakemoot: chrstphrhrt: well the issue is that as a pypi package developer you can't do that without explicitly research the package repositories of every OS you'd want to provide info for

chrstphrhrt: yeukhon: yeah log parsing could be a good way.. in any case it will be a giant pile of exceptions

yeukhon: im not actually familiar with the linking thing thb. chrstphrhrt but do u know if there is a way to tell Python.h is missing because python-dev is not installed?

chrstphrhrt: yeukhon: well i think i could spoof some pretty epic failures by assuming everything i’ve ever seen as deps for a fake package then using that as a start

chrstphrhrt: it could end up being statistical

chrstphrhrt: or user-submittable

chrstphrhrt: or user-submittable

yeukhon: chrstphrhrt:  that is a good start

yeukhon: just print a friendly message to say these are the os dep

chrstphrhrt: right

yeukhon: yeah i guess pypa guys might be interested in it. not sure if this has ever been discussed before: https://bitbucket.org/pypa/setuptools

yeukhon: even a pip-helper utility might be a good thing…

Opinions?

Bitbucket: https://bitbucket.org/pypa/setuptools/issue/241

ghost commented 10 years ago

Original comment by yeukhon (Bitbucket: yeukhon, GitHub: yeukhon):

I think the first question is whether we should indicate the non-Python dependency in setup.py at all, or whether we should let authors do this in README.

Secondly, how do we handle cross-dependency hell if we think it is a good idea to list dependency in setup.py.

By cross-dependency, I meant this:

matplotlib requires numpy and numpy requires a bunch of libc to be installed and you can imagine adding scipy and other relevant tools into this dependency hell.

Do we need to go verbose to include everything numpy need? How do we reference that?

Now, how do we define "handle?" How smart should this "handler" be? A simple reminder or a traceback analyzer or a dependency graph algorithm?

We certainly can do the simple list reminder, instead of writing requires for python packages, we write another requires for non-python packages. When we hit an error, we can simply remind the user they need these packages. I believe we install one package at a time so it's actually not that difficult to tell the user at which point the problem was occurred (in fact I believe, out of my memory, we do tell user which package failed to install).

This can be a helpful first step. We can then explore possible a dependency graph algorithm (I think most will use DAG).

A spin-off idea is what if when we run python setup.py develop/install we can tell setuptools to look up the dependency of numpy.

I just looked at matplot lib source code: https://github.com/matplotlib/matplotlib/blob/master/setup.py

which they've defined in https://github.com/matplotlib/matplotlib/blob/master/setupext.py (IPython has its own too)

I will see what I can hack on next week and I think we probably should also move to pypa to discuss this? Is pypa-dev on google group the right place? Instead of issue tracker?

ghost commented 10 years ago

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):

I've thought about this before as well. I believe it's an important part of the process. It extends to other platforms such as Windows as well. The ability for a package to advertize its build dependencies for situations where it will be built from source would be useful. That declaration alone would allow build tools to put together better checks and reporting of unsatisfied dependencies.

At this point, the problem extends beyond setuptools. While setuptools is still the premier build system for Python packages, the loose plan is to have its functionality broken out into different tools, such that the builds themselves might be performed by different libraries or tools. As a result, the topic of this ticket will need to be addressed at a broader scope, probably in a PEP to be reviewed by the PyPA and approved by @ncoghlan.

I would be willing to accept pull requests for proof-of-concept functionality in setuptools, but it should be taking into account the broad scope of concerns around this challenge.

My instinct tells me it would be impractical to try to capture the error messages in the traceback and apply a heuristic to them to characterize the nature of the failure. I would prefer instead to capture the expectations up front (compilers, libs, etc) and test their presence. Anything else would be very brittle and almost certainly unmaintainable. Any solution will need to be robust.

Does setuptools have any hooks that wrapper packages could use to catch the dependency errors and call a retry after the missing dependencies are satisfied?

Off the top of my head, there is no suitable hook mechanism for this process. Do feel free to explore it further.

pypa / setuptools

Automatic OS dependency error catching with interactive prompt #241