Closed bukzor closed 9 years ago
What is your current install setup? Scikit-learn requires numpy and if you install everything together, it should work fine.
At minimum, this could use a better error message.
Agreed.
Is it not possible to have a project that depends on sklearn and installs all its dependencies to a virtualenv in a single pass?
How are you installing things? If you are compiling from source, you do need numpy installed.
Must I do one pass to install numpy and everything else, and a second pass just to install sklearn?
I would be surprised that scipy behaves different.
Scipy does behave differently: https://github.com/scipy/scipy/blob/master/setup.py#L190 ( I think)
Scipy does behave differently: https://github.com/scipy/scipy/blob/master/setup.py#L190
Hum, interesting.
There are pros and cons for doing it this way for numpy. The pro is that it probably solves the OPs problem. The con is that more people are going to be hand compiling numpy instead of installing good packages, and thus be left with horrible linear algebra package.
My hunch would be to leave it the way it is, but I see the point of the clever trick used in scipy's setup.py.
Demo:
rm -rf fresh
virtualenv fresh
. fresh/bin/activate
# pip 6 reverses order of "top-level" requirements -.-
# https://github.com/pypa/pip/issues/2260
pip install --upgrade pip
echo >requirements.txt 'numpy
scikit-learn'
pip install -r requirements.txt
output:
$ sh demo.sh
New python executable in fresh/bin/python
Installing setuptools, pip...done.
Downloading/unpacking pip from https://pypi.python.org/packages/py2.py3/p/pip/pip-6.0.6-py2.py3-none-any.whl#md5=0472d9dc76a0df6cc6ab545e40aef832
Downloading pip-6.0.6-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Installing collected packages: pip
Found existing installation: pip 1.5.6
Uninstalling pip:
Successfully uninstalled pip
Successfully installed pip
Cleaning up...
Collecting numpy (from -r requirements.txt (line 1))
Using cached numpy-1.9.1.tar.gz
Running from numpy source directory.
Collecting scikit-learn (from -r requirements.txt (line 2))
Using cached scikit-learn-0.15.2.tar.gz
Partial import of sklearn during the build process.
Installing collected packages: scikit-learn, numpy
Running setup.py install for scikit-learn
Partial import of sklearn during the build process.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py", line 154, in <module>
setup_package()
File "/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py", line 146, in setup_package
from numpy.distutils.core import setup
ImportError: No module named numpy.distutils.core
Complete output from command /nail/home/buck/tmp/fresh/bin/python -c "import setuptools, tokenize;__file__='/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /nail/tmp/pip-x4l77V-record/install-record.txt --single-version-externally-managed --compile --install-headers /nail/home/buck/tmp/fresh/include/site/python2.6:
Partial import of sklearn during the build process.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py", line 154, in <module>
setup_package()
File "/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py", line 146, in setup_package
from numpy.distutils.core import setup
ImportError: No module named numpy.distutils.core
----------------------------------------
Command "/nail/home/buck/tmp/fresh/bin/python -c "import setuptools, tokenize;__file__='/nail/tmp/pip-build-oylnbV/scikit-learn/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /nail/tmp/pip-x4l77V-record/install-record.txt --single-version-externally-managed --compile --install-headers /nail/home/buck/tmp/fresh/include/site/python2.6" failed with error code 1 in /nail/tmp/pip-build-oylnbV/scikit-learn
Maybe we could make this demo work, but chances are that you would have a really lousy install of numpy and scikit-learn. The problem is that numpy wouldn't be linked to good linear algebra packages, unless you know what you are doing, and you have installed the headers of these before. But that last operation is much harder than getting the pip/python part right.
Really, you should be building from source only if you know well what you do. In which case none of this should be a problem.
If you are not an expert, please use prebuild packages. At least this is my opinion.
You are right, installing this way really is a bad idea in most cases...
This is a serious issue, since virtualenv and pip are really becoming the standard to set up a test or development environment.
How can I run tox-based contiuous integration tests for projects that depend on sklearn then?
I agree that using prebuilt packages provided by a decent Linux distribution is the way for production/real life applications, but CI and automatic tests are required tools too.
scikit-learn should not be installed using "pip install" or setuptools in general, but scikit-learn must be installable that way.
David
Hum, continuous integration is a good point. However, the value of the integration is not that high if your production setup is quite different from your CI setup....
@douardda can you try if the scipy hack works for you? That is
try:
import numpy
except:
build_requires = ['numpy>=1.6.2']
But I guess it will be more complicated than that...
Hum, our setup.py seems pretty identical to the scipy one, I'm not sure what makes theirs work.
Humm, on my jessie laptop, in a fresh virtualenv, "pip install scipy" also fails with a "ImportError: No module named numpy.distutils.core".
@douardda Would you want to test out the patch in #4332?
Interesting, for me scipy works but scikit-learn fails.
@amueller Any ideas on this? I read through #4332 and don't see a solution being proposed that actually installs scikit-learn correctly like scipy does when both are in a requirements file (please correct me if I am mistaken). We're using pre-built wheel packages to avoid compile/configuration times taking forever, and I'm running into this same issue. I've got another project using scipy that works flawlessly when numpy is in the requirements file, but I get this error specifically for scikit-learn. I'll try adding scipy to the requirements.txt to see if that helps, but from the errors I'm seeing it's bombing out before even trying to install.
edit Actually it looks like @saketkc's work over in #4371 might address this. What's the timeline looking like on getting that merged in? I can work around it by installing numpy, THEN going back through the requirements.txt in my config management tool but it's a pain and it increases build times.
There are pros and cons for doing it this way for numpy. The pro is that it probably solves the OPs problem. The con is that more people are going to be hand compiling numpy instead of installing good packages, and thus be left with horrible linear algebra package.
It is not possible to build scipy from source without installing blas, lapack and gfortran. At this point users will have to read the scipy doc to build from source and install an optimized BLAS / LAPACK.
+1 for using the setup_requires=['numpy']
and install_requires=['numpy', 'scipy']
in scikit-learn to have pip install scikit-learn
work by default as long as the non-python system build deps (gcc, gfortran, BLAS & LAPACK headers) are installed.
Actually I changed my mind as explained in my last comment in #4371. Let's close this for now.
If I understand it, It's still not possible to have a project that installs numpy and sklearn in the same step, and you all don't plan to fix it.
Am I right?
From what @ogrisel said https://github.com/scikit-learn/scikit-learn/pull/4371#issuecomment-97015517 it is non-trivial to make this work with ubuntu stable pip. Is this for CI purposes for you? Do you really want to build scipy in your CI? Or are you installing wheels?
didn't @GaelVaroquaux mention a flag to force installing dependencies?
Various processes at our company involve 'pip install -r requirements.txt'. This works fine for any group of packages not containing sklearn. I feel certain this isn't unique to me or my company.
On Fri, May 8, 2015, 4:29 PM Andreas Mueller notifications@github.com wrote:
didn't @GaelVaroquaux https://github.com/GaelVaroquaux mention a flag to force installing dependencies?
— Reply to this email directly or view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/4164#issuecomment-100393543 .
Various processes at our company involve 'pip install -r requirements.txt'.
For production or testing? If it's for production, I would be worried about inefficiencies in the binaries produced.
This works fine for any group of packages not containing sklearn.
numerics / data analytics are a different beast than say web services. It is very hard to have a one-size-fits-all solution. The ecosystem is slowly evolving toward one, but it takes time.
I agree that if that is how you produce production binaries, then you are most certainly doing it wrong. Though I don't agree with @GaelVaroquaux that scikit-learn should tell you which scipy to use. I feel it's scipy's responsibility to make sure scipy is installed in a sane way.
What do you think of using extras_require
to make available something like a scikit-learn[pip]
that actually includes numpy
and scipy
under install_requires
?
This would preserve the current behavior for most users, but would allow people using a standard virtualenv/pip toolchain to have a way to set up environments including scikit-learn without having to jump through a large number of hoops.
cc @cancan101
I think that would be fine.
from numpy.distutils.core import setup ImportError: No module named 'numpy'
what have to be done in order to rectify the error?
Any progress on this issue? It'd be really great if there was a way to use scikit-learn inside a virtualenv by including it (and whatever versions of numpy/scipy/etc. one wants) in a requirements.txt file. Virtualenvs are a very standard way to deploy applications these days, so this is a pretty common use case.
Since there was a solution under discussion as of a few months ago, maybe it makes sense to reopen the issue at least?
I never got a chance to work on this. I still think it should be possible/straightforward to set up a scikit-learn[pip]
extra that has the right dependencies, especially now that pip-tools properly supports extras.
Seconding @timabbott.
PR up at https://github.com/scikit-learn/scikit-learn/pull/6990. It's a very straightforward change.
Temporary fix by overriding the install
command in setup.py
:
import pip
from setuptools import setup
from setuptools.command.install import install
from pip.req import parse_requirements
install_reqs = parse_requirements('./requirements.txt', session=False)
reqs = [str(ir.req) for ir in install_reqs]
class OverrideInstall(install):
"""
Emulate sequential install of pip install -r requirements.txt
To fix numpy bug in scipy, scikit in py2
"""
def run(self):
for req in reqs:
pip.main(["install", req])
# the setup
setup(
...
cmdclass={'install': OverrideInstall}
....
)
Then run python setup.py install
as usual
At minimum, this could use a better error message. Is it not possible to have a project that depends on sklearn and installs all its dependencies to a virtualenv in a single pass? Must I do one pass to install numpy and everything else, and a second pass just to install sklearn?