readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 228 forks source link

Bootstrap numpy installation in setup.py #174

Closed mgaitan closed 7 years ago

mgaitan commented 7 years ago

currently, aeneas imports numpy in its setup.py, meaning it need to be installed before, in separated step. setuptools can

Setuptools handle this situation with the setup_requires argument, but there is an extra hack needed for numpy

readbeyond commented 7 years ago

Hi,

thank you for your feedback.

I quickly tried the workaround described in http://stackoverflow.com/a/21621689 , doing the following in an empty virtualenv:

  1. patch setup.py as described
  2. python setup.py sdist
  3. pip install aeneas-1.7.3.0.tar.gz

and indeed it successfully installs aeneas along with its dependencies and it compiles the aeneas C extensions, without requiring numpy to be installed first.

However, step 3 takes "a lot" of time --- around 3 minutes on my fast laptop ---, because it seems to compile numpy first. (Clearly a couple of minutes is a relatively negligible amount of time, but it is "a lot" when compared to installing numpy from a pre-compiled wheel, which takes just a few seconds.)

A quick Google search reveals that this "trick" is quite popular (because of the StackOverflow Q&A?) among Python packages depending on NumPy and shipping their own C extensions including the NumPy C headers. On the other hand, I would be much more comfortable shipping it if I can find a more substantial documentation --- like a tutorial/FAQ from NumPy devs telling "this is the blessed way of doing this" --- but so far I have found nothing.

Do you have any pointers? Shall we ask the NumPy guys?

On 05/08/2017 06:18 AM, Martín Gaitán wrote:

currently, aeneas imports numpy in its setup.py, meaning it need to be installed before, in separated step. setuptools can

Setuptools handle this situation with the |setup_requires| argument, but there is an extra hack needed for numpy http://stackoverflow.com/a/21621689

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/readbeyond/aeneas/issues/174, or mute the thread https://github.com/notifications/unsubscribe-auth/AFEodmKKUbtjaEj6L2EwhDCY1iEXyFUlks5r3pd7gaJpZM4NTd0y.

-- Alberto Pettarin

web: http://readbeyond.it/ web: http://www.albertopettarin.it/ twitter: http://twitter.com/acutebit/ skype: alberto_pettarin mobile: +39 340 82 18 704

mgaitan commented 7 years ago

The point seems to be that if you install from source (tar.gz generated via sdist), requirements are also installed from sources. So, the "uncomfortable" part of compile everything could be avoided (for most users) generating wheels as discussed in #157.

Example:

after a valid build/install via python setup.py develop (check #177) I've generated a wheel package

(aeneas) tin@morochita:~/lab/aeneas$ pip install -U pip wheel
Requirement already up-to-date: pip in /home/tin/.virtualenvs/aeneas/lib/python3.5/site-packages
Requirement already up-to-date: wheel in /home/tin/.virtualenvs/aeneas/lib/python3.5/site-packages
(aeneas) tin@morochita:~/lab/aeneas$ python setup.py bdist_wheel
running bdist_wheel
running build
running build_py
copying aeneas/globalconstants.py -> build/lib.linux-x86_64-3.5/aeneas
copying aeneas/hierarchytype.py -> build/lib.linux-x86_64-3.5/aeneas
copying aeneas/vad.py -> build/lib.linux-x86_64-3.5/aeneas
.... 

It was built succesfully

(aeneas) tin@morochita:~/lab/aeneas$ cd dist/
(aeneas) tin@morochita:~/lab/aeneas/dist$ ls
aeneas-1.7.3.0-cp35-cp35m-linux_x86_64.whl

And then I've installed it in a fresh virtualenv

(aeneas) tin@morochita:~/lab/aeneas/dist$ mkvirtualenv -p /usr/bin/python3.5 aeneas2
Running virtualenv with interpreter /usr/bin/python3.5
Using base prefix '/usr'
New python executable in /home/tin/.virtualenvs/aeneas2/bin/python3.5
Also creating executable in /home/tin/.virtualenvs/aeneas2/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
(aeneas2) tin@morochita:~/lab/aeneas/dist$ pip install ~/lab/aeneas/dist/aeneas-1.7.3.0-cp35-cp35m-linux_x86_64.whl
Processing ./aeneas-1.7.3.0-cp35-cp35m-linux_x86_64.whl
Collecting numpy>=1.9 (from aeneas==1.7.3.0)
  Using cached numpy-1.12.1-cp35-cp35m-manylinux1_x86_64.whl
Collecting lxml>=3.6.0 (from aeneas==1.7.3.0)
  Using cached lxml-3.7.3-cp35-cp35m-manylinux1_x86_64.whl
Collecting BeautifulSoup4>=4.5.1 (from aeneas==1.7.3.0)
  Using cached beautifulsoup4-4.6.0-py3-none-any.whl
Installing collected packages: numpy, lxml, BeautifulSoup4, aeneas
Successfully installed BeautifulSoup4-4.6.0 aeneas-1.7.3.0 lxml-3.7.3 numpy-1.12.1
(aeneas2) tin@morochita:~/lab/aeneas$ python -c 'import aeneas'
(aeneas2) tin@morochita:~/lab/aeneas$ 
readbeyond commented 7 years ago

Yes, I agree that providing wheels is the best solution.

However, if I create the wheel e.g. on my laptop, which has very up-to-date libraries, chances are that something will break if another user gets the wheel and they have an "older" system. That's why PyPI suggests using their ancient CentOS 5 VMs to create wheels: it is so old that if your Python wheel can be compiled there, there is a good chance that the wheel will work on any (more modern) Linux box. Unfortunately, as detailed in issue #157, aeneas requires eSpeak, and it is not trivial installing the latter on the CentOS 5 VM. (BTW, this also means that the travis/appveyor will probably fail to create wheels as well, since you need to provide a setup script, which would contain something like yum install espeak ... .)

So, let's assume for a moment that we do not a reliable way of providing wheels.

As I stated above, I confirm that your suggestion solves the issue, automatically installing numpy before the rest of the aeneas setup, even if numpy is not installed already.

So, now I am facing a choice between: A. requiring the user to install numpy before aeneas (as it is now), or B. adopting the proposed approach using a custom build_ext class in the setup.py.

I am not against option B, but I need more information before embracing it, as it comes with uncertainties: what happens if the compilation of numpy fails on the user's system for any reason? (Note that, in contrast, option A., that is pip install numpy; pip install aeneas will probably pull a numpy wheel, which can be considered pretty surely installable.)

To sum up: before adopting option B, I would like to see evidence that it is indeed the "blessed" approach --- "blessed" by the numpy developers, I guess. @mgaitan , do you have any pointer on this?

readbeyond commented 7 years ago

Closed with #177

sn-synth commented 11 months ago

A few years later, any plans to change this? Needing to install numpy first is a problem in a lot of applications (for example with CI when aeneas is used as a dependency)