heroku-python / conda-buildpack

[DEPRECATED] Buildpack for Conda.
MIT License
157 stars 251 forks source link

mkl increasing slug size significantly #21

Open mszheng opened 8 years ago

mszheng commented 8 years ago

Since the upgrade described here (Feb 5 2016): https://www.continuum.io/blog/developer-blog/anaconda-25-release-now-mkl-optimizations, conda is defaulting to the mkl optimized numpy and scipy, which require the ~120 MB mkl package. This can easily bump the slug size over 300 MB. It's simple to work around this by specifying "nomkl" in conda-requirements.txt, but perhaps that should be the default for this buildpack.

htylab commented 8 years ago

Recommend it. remove mkl in the default setting.

durkode commented 8 years ago

I agree with this. It took me a long while trying to work out how to reduce my slug size, as the lowest I could get it was down to 310MB. adding nomkl instantly dropped to it 166MB. It would be great to either have this as the default or featured in the readme to increase visibility of this.

alexlouden commented 8 years ago

:+1: Adding nomkl reduced my slug from 420mb to 280mb. Using scikit-learn, scipy, numpy and opencv.

mrgordon commented 8 years ago

+1, this project is borderline unusable on Heroku without this. Thanks for the hard work as this buildpack is a pleasure other than the slug size issues!

evdoks commented 8 years ago

Putting nomkl in conda-requirements.txt does not seem to be helping in my case. Having following packages in conda-requirements.txt

nomkl
scipy=0.17.0
scikit-learn=0.17.1
pandas=0.18.0
nltk=3.2.1
sqlalchemy=1.0.12
joblib=0.9.4

still leads to mkl being downloaded:

     The following packages will be downloaded:
remote:        
remote:            package                    |            build
remote:            ---------------------------|-----------------
remote:            libgcc-5.2.0               |                0         1.1 MB
remote:            libgfortran-3.0.0          |                1         281 KB
remote:            mkl-11.3.1                 |                0       121.2 MB
remote:            nomkl-1.0                  |                0          402 B
remote:            system-5.8                 |                2         170 KB
remote:            openblas-0.2.14            |                0         6.6 MB
remote:            joblib-0.9.4               |           py27_0         121 KB
remote:            nltk-3.2.1                 |           py27_0         1.7 MB
remote:            numpy-1.10.4               |     py27_nomkl_0         6.0 MB
remote:            pytz-2016.3                |           py27_0         178 KB
remote:            six-1.10.0                 |           py27_0          16 KB
remote:            sqlalchemy-1.0.12          |           py27_0         1.3 MB
remote:            python-dateutil-2.5.2      |           py27_0         236 KB
remote:            scipy-0.17.0               |      np110py27_3        31.3 MB
remote:            pandas-0.18.0              |      np110py27_0        12.0 MB
remote:            scikit-learn-0.17.1        |np110py27_nomkl_0         8.6 MB
remote:            ------------------------------------------------------------
remote:                                                   Total:       190.6 MB
remote:        
remote:        The following NEW packages will be INSTALLED:
remote:        
remote:            joblib:          0.9.4-py27_0            
remote:            libgcc:          5.2.0-0                 
remote:            libgfortran:     3.0.0-1                 
remote:            mkl:             11.3.1-0                
remote:            nltk:            3.2.1-py27_0            
remote:            nomkl:           1.0-0                   
remote:            numpy:           1.10.4-py27_nomkl_0      [nomkl]
remote:            openblas:        0.2.14-0                
remote:            pandas:          0.18.0-np110py27_0      
remote:            python-dateutil: 2.5.2-py27_0            
remote:            pytz:            2016.3-py27_0           
remote:            scikit-learn:    0.17.1-np110py27_nomkl_0 [nomkl]
remote:            scipy:           0.17.0-np110py27_3      
remote:            six:             1.10.0-py27_0           
remote:            sqlalchemy:      1.0.12-py27_0           
remote:            system:          5.8-2  

Am I missing something?

dtran320 commented 8 years ago

@evdoks For some reason this broke for me recently as well, perhaps due to a regression in Conda or a change in the way dependencies are handled. The workaround I finally stumbled on was to pass the --no-deps flag to conda install and explicitly list out all your packages in conda-requirements.txt. See https://github.com/conda/conda/issues/2032#issuecomment-197163898

This means, unfortunately that you'll need to fork/edit this buildpack (which I was already doing to make it work well with my multi-buildpack setup). The line you need to change in question is in bin/steps/conda_compile:

It previously was:

conda install --file conda-requirements.txt --yes | indent

and should be changed to:

conda install --no-deps --file conda-requirements.txt --yes | indent
evdoks commented 8 years ago

@dtran320 Thanks! Having --no-depts has solved the problem.

mcg1969 commented 8 years ago

That's a great workaround. The transition from mkl to nomkl has proven... difficult.

jake17007 commented 7 years ago

@dtran320, @evdoks

Instead of forking and adding --no-depts, I added nomkl and the highest order package I needed: scikit-learn, but specified nomkl on that dependency.

My conda-requirements.txt:

nomkl
scikit-learn=0.18.1=np111py27_nomkl_0

That allowed me to use this conda buildpack and let the solver find the dependencies, circumventing having to specify lower level packages like numpy, scipy, etc.

dtran320 commented 7 years ago

@jake17007 Ah, using --no-deps actually broke some things for us with the newest versions, so we've migrated away from that solution

dtran320 commented 7 years ago

Hmm, did everyone's workarounds just break? For some reason, my buildpack went back to mkl:

       The following packages will be downloaded:

           package                    |            build
           ---------------------------|-----------------
           mkl-11.3.3                 |                0       122.1 MB
           openblas-0.2.19            |                0         3.0 MB
           numpy-1.11.2               |           py27_0         6.2 MB
           scipy-0.18.1               |      np111py27_0        30.9 MB
           scikit-learn-0.18.1        |      np111py27_0        10.9 MB
           ------------------------------------------------------------
                                                  Total:       173.1 MB

       The following NEW packages will be INSTALLED:

           mkl:          11.3.3-0                

       The following packages will be UPDATED:

           numpy:        1.11.2-py27_nomkl_0      [nomkl] --> 1.11.2-py27_0     
           openblas:     0.2.14-4                 --> 0.2.19-0          
           scikit-learn: 0.18.1-np111py27_nomkl_0 [nomkl] --> 0.18.1-np111py27_0
           scipy:        0.18.1-np111py27_nomkl_0 [nomkl] --> 0.18.1-np111py27_0
icoxfog417 commented 7 years ago

In my experience, we have to do following steps.

Below is my repository that succeeded to deploy to Heroku recently.

icoxfog417/machine_learning_in_application

And my patched buildpack is below(It supports Python3 also).

icoxfog417/conda-buildpack