scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
60.33k stars 25.44k forks source link

can't setup.py install without numpy #4164

Closed bukzor closed 9 years ago

bukzor commented 9 years ago

At minimum, this could use a better error message. Is it not possible to have a project that depends on sklearn and installs all its dependencies to a virtualenv in a single pass? Must I do one pass to install numpy and everything else, and a second pass just to install sklearn?

$ python setup.py install
Partial import of sklearn during the build process.
Traceback (most recent call last):
  File "setup.py", line 154, in <module>
    setup_package()
  File "setup.py", line 146, in setup_package
    from numpy.distutils.core import setup
ImportError: No module named numpy.distutils.core
amueller commented 9 years ago

What is your current install setup? Scikit-learn requires numpy and if you install everything together, it should work fine.

GaelVaroquaux commented 9 years ago

At minimum, this could use a better error message.

Agreed.

Is it not possible to have a project that depends on sklearn and installs all its dependencies to a virtualenv in a single pass?

How are you installing things? If you are compiling from source, you do need numpy installed.

Must I do one pass to install numpy and everything else, and a second pass just to install sklearn?

I would be surprised that scipy behaves different.

amueller commented 9 years ago

Scipy does behave differently: https://github.com/scipy/scipy/blob/master/setup.py#L190 ( I think)

GaelVaroquaux commented 9 years ago

Scipy does behave differently: https://github.com/scipy/scipy/blob/master/setup.py#L190

Hum, interesting.

There are pros and cons for doing it this way for numpy. The pro is that it probably solves the OPs problem. The con is that more people are going to be hand compiling numpy instead of installing good packages, and thus be left with horrible linear algebra package.

My hunch would be to leave it the way it is, but I see the point of the clever trick used in scipy's setup.py.

bukzor commented 9 years ago

Demo:

rm -rf fresh
virtualenv fresh
. fresh/bin/activate

# pip 6 reverses order of "top-level" requirements -.-
#   https://github.com/pypa/pip/issues/2260
pip install --upgrade pip

echo >requirements.txt 'numpy
scikit-learn'
pip install  -r requirements.txt

output:

$ sh demo.sh
New python executable in fresh/bin/python
Installing setuptools, pip...done.
Downloading/unpacking pip from https://pypi.python.org/packages/py2.py3/p/pip/pip-6.0.6-py2.py3-none-any.whl#md5=0472d9dc76a0df6cc6ab545e40aef832
  Downloading pip-6.0.6-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Installing collected packages: pip
  Found existing installation: pip 1.5.6
    Uninstalling pip:
      Successfully uninstalled pip
Successfully installed pip
Cleaning up...
Collecting numpy (from -r requirements.txt (line 1))
  Using cached numpy-1.9.1.tar.gz
    Running from numpy source directory.
Collecting scikit-learn (from -r requirements.txt (line 2))
  Using cached scikit-learn-0.15.2.tar.gz
    Partial import of sklearn during the build process.
Installing collected packages: scikit-learn, numpy
  Running setup.py install for scikit-learn
    Partial import of sklearn during the build process.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py", line 154, in <module>
        setup_package()
      File "/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py", line 146, in setup_package
        from numpy.distutils.core import setup
    ImportError: No module named numpy.distutils.core
    Complete output from command /nail/home/buck/tmp/fresh/bin/python -c "import setuptools, tokenize;__file__='/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /nail/tmp/pip-x4l77V-record/install-record.txt --single-version-externally-managed --compile --install-headers /nail/home/buck/tmp/fresh/include/site/python2.6:
    Partial import of sklearn during the build process.

    Traceback (most recent call last):

      File "<string>", line 1, in <module>

      File "/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py", line 154, in <module>

        setup_package()

      File "/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py", line 146, in setup_package

        from numpy.distutils.core import setup

    ImportError: No module named numpy.distutils.core

    ----------------------------------------
    Command "/nail/home/buck/tmp/fresh/bin/python -c "import setuptools, tokenize;__file__='/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /nail/tmp/pip-x4l77V-record/install-record.txt --single-version-externally-managed --compile --install-headers /nail/home/buck/tmp/fresh/include/site/python2.6" failed with error code 1 in /nail/tmp/pip-build-oylnbV/scikit-learn
GaelVaroquaux commented 9 years ago

Maybe we could make this demo work, but chances are that you would have a really lousy install of numpy and scikit-learn. The problem is that numpy wouldn't be linked to good linear algebra packages, unless you know what you are doing, and you have installed the headers of these before. But that last operation is much harder than getting the pip/python part right.

Really, you should be building from source only if you know well what you do. In which case none of this should be a problem.

If you are not an expert, please use prebuild packages. At least this is my opinion.

amueller commented 9 years ago

You are right, installing this way really is a bad idea in most cases...

douardda commented 9 years ago

This is a serious issue, since virtualenv and pip are really becoming the standard to set up a test or development environment.

How can I run tox-based contiuous integration tests for projects that depend on sklearn then?

I agree that using prebuilt packages provided by a decent Linux distribution is the way for production/real life applications, but CI and automatic tests are required tools too.

scikit-learn should not be installed using "pip install" or setuptools in general, but scikit-learn must be installable that way.

David

amueller commented 9 years ago

Hum, continuous integration is a good point. However, the value of the integration is not that high if your production setup is quite different from your CI setup....

amueller commented 9 years ago

@douardda can you try if the scipy hack works for you? That is

try:
    import numpy
except:
    build_requires = ['numpy>=1.6.2']

But I guess it will be more complicated than that...

amueller commented 9 years ago

Hum, our setup.py seems pretty identical to the scipy one, I'm not sure what makes theirs work.

douardda commented 9 years ago

Humm, on my jessie laptop, in a fresh virtualenv, "pip install scipy" also fails with a "ImportError: No module named numpy.distutils.core".

saketkc commented 9 years ago

@douardda Would you want to test out the patch in #4332?

amueller commented 9 years ago

Interesting, for me scipy works but scikit-learn fails.

gravyboat commented 9 years ago

@amueller Any ideas on this? I read through #4332 and don't see a solution being proposed that actually installs scikit-learn correctly like scipy does when both are in a requirements file (please correct me if I am mistaken). We're using pre-built wheel packages to avoid compile/configuration times taking forever, and I'm running into this same issue. I've got another project using scipy that works flawlessly when numpy is in the requirements file, but I get this error specifically for scikit-learn. I'll try adding scipy to the requirements.txt to see if that helps, but from the errors I'm seeing it's bombing out before even trying to install.

edit Actually it looks like @saketkc's work over in #4371 might address this. What's the timeline looking like on getting that merged in? I can work around it by installing numpy, THEN going back through the requirements.txt in my config management tool but it's a pain and it increases build times.

ogrisel commented 9 years ago

There are pros and cons for doing it this way for numpy. The pro is that it probably solves the OPs problem. The con is that more people are going to be hand compiling numpy instead of installing good packages, and thus be left with horrible linear algebra package.

It is not possible to build scipy from source without installing blas, lapack and gfortran. At this point users will have to read the scipy doc to build from source and install an optimized BLAS / LAPACK.

ogrisel commented 9 years ago

+1 for using the setup_requires=['numpy'] and install_requires=['numpy', 'scipy'] in scikit-learn to have pip install scikit-learn work by default as long as the non-python system build deps (gcc, gfortran, BLAS & LAPACK headers) are installed.

ogrisel commented 9 years ago

Actually I changed my mind as explained in my last comment in #4371. Let's close this for now.

bukzor commented 9 years ago

If I understand it, It's still not possible to have a project that installs numpy and sklearn in the same step, and you all don't plan to fix it.

Am I right?

amueller commented 9 years ago

From what @ogrisel said https://github.com/scikit-learn/scikit-learn/pull/4371#issuecomment-97015517 it is non-trivial to make this work with ubuntu stable pip. Is this for CI purposes for you? Do you really want to build scipy in your CI? Or are you installing wheels?

amueller commented 9 years ago

didn't @GaelVaroquaux mention a flag to force installing dependencies?

bukzor commented 9 years ago

Various processes at our company involve 'pip install -r requirements.txt'. This works fine for any group of packages not containing sklearn. I feel certain this isn't unique to me or my company.

On Fri, May 8, 2015, 4:29 PM Andreas Mueller notifications@github.com wrote:

didn't @GaelVaroquaux https://github.com/GaelVaroquaux mention a flag to force installing dependencies?

— Reply to this email directly or view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/4164#issuecomment-100393543 .

GaelVaroquaux commented 9 years ago

Various processes at our company involve 'pip install -r requirements.txt'.

For production or testing? If it's for production, I would be worried about inefficiencies in the binaries produced.

This works fine for any group of packages not containing sklearn.

numerics / data analytics are a different beast than say web services. It is very hard to have a one-size-fits-all solution. The ecosystem is slowly evolving toward one, but it takes time.

amueller commented 9 years ago

I agree that if that is how you produce production binaries, then you are most certainly doing it wrong. Though I don't agree with @GaelVaroquaux that scikit-learn should tell you which scipy to use. I feel it's scipy's responsibility to make sure scipy is installed in a sane way.

taion commented 8 years ago

What do you think of using extras_require to make available something like a scikit-learn[pip] that actually includes numpy and scipy under install_requires?

This would preserve the current behavior for most users, but would allow people using a standard virtualenv/pip toolchain to have a way to set up environments including scikit-learn without having to jump through a large number of hoops.

cc @cancan101

amueller commented 8 years ago

I think that would be fine.

Raji25 commented 8 years ago

from numpy.distutils.core import setup ImportError: No module named 'numpy'

what have to be done in order to rectify the error?

timabbott commented 8 years ago

Any progress on this issue? It'd be really great if there was a way to use scikit-learn inside a virtualenv by including it (and whatever versions of numpy/scipy/etc. one wants) in a requirements.txt file. Virtualenvs are a very standard way to deploy applications these days, so this is a pretty common use case.

Since there was a solution under discussion as of a few months ago, maybe it makes sense to reopen the issue at least?

taion commented 8 years ago

I never got a chance to work on this. I still think it should be possible/straightforward to set up a scikit-learn[pip] extra that has the right dependencies, especially now that pip-tools properly supports extras.

matt-carter commented 8 years ago

Seconding @timabbott.

taion commented 8 years ago

PR up at https://github.com/scikit-learn/scikit-learn/pull/6990. It's a very straightforward change.

eligiblekeng commented 8 years ago

Temporary fix by overriding the install command in setup.py:

import pip
from setuptools import setup
from setuptools.command.install import install
from pip.req import parse_requirements
install_reqs = parse_requirements('./requirements.txt', session=False)
reqs = [str(ir.req) for ir in install_reqs]

class OverrideInstall(install):

    """
    Emulate sequential install of pip install -r requirements.txt
    To fix numpy bug in scipy, scikit in py2
    """

    def run(self):
        for req in reqs:
            pip.main(["install", req])

# the setup
setup(
    ...
    cmdclass={'install': OverrideInstall}
    ....
)

Then run python setup.py install as usual