googledatalab / datalab

Interactive tools and developer experiences for Big Data on Google Cloud Platform.
Apache License 2.0
974 stars 249 forks source link

The library installed into the custom docker image is not available. #2166

Open mikhail-khodorovskiy opened 4 years ago

mikhail-khodorovskiy commented 4 years ago

I used the example Docker file: https://github.com/googledatalab/datalab/blob/master/containers/datalab/Dockerfile-extended-example.in to install xlrd library.

My docker image build executes and the library is being added to the image:

Status: Downloaded newer image for gcr.io/cloud-datalab/datalab:latest
 ---> 8674bc940119
Step 2/5 : COPY requirements.txt .
 ---> c7a85b8b0391
Step 3/5 : RUN pip install -r requirements.txt
 ---> Running in c2cfcca4fd73
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Collecting xlrd==1.2.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/b0/16/63576a1a001752e34bf8ea62e367997530dc553b689356b9879339cf45a4/xlrd-1.2.0-py2.py3-none-any.whl (103kB)
Installing collected packages: xlrd
Successfully installed xlrd-1.2.0
Removing intermediate container c2cfcca4fd73

However when I try to use the library in my notebook, it does not seem to be available using both Python 2 and 3 kernels:

import xlrd
ImportErrorTraceback (most recent call last)
<ipython-input-2-2743bb67f6dd> in <module>()
----> 1 import xlrd
ImportError: No module named xlrd
mikhail-khodorovskiy commented 4 years ago

If I check inside the docker process the package IS installed but it's still not available in the notebook:

sa_117223812809297213569@hsq-dev-mikhail-datalab ~ $ docker exec -it b83fe2367636 bash
root@b83fe2367636:/# pip install xlrd
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Requirement already satisfied: xlrd in /usr/local/lib/python2.7/site-packages (1.2.0)
mikhail-khodorovskiy commented 4 years ago

It turns out the pip in the default path of the Docker image IS NOT the pip used for the notebook environments to the documentation to add custom libraries and the example is wrong: https://cloud.google.com/datalab/docs/how-to/adding-libraries

The docker file should look like this:

FROM gcr.io/cloud-datalab/datalab:latest

# The following line will install package xlrd, as an example
COPY requirements.txt .

RUN /usr/local/envs/py2env/bin/pip install -r requirements.txt
RUN /usr/local/envs/py3env/bin/pip install -r requirements.txt

Please update the documentation and the example Docker file.