okfn-brasil / rosie

🤖 Python application responsible for Serenata de Amor's intelligence
409 stars 60 forks source link

Using Docker alpine #96

Closed caduvieira closed 7 years ago

caduvieira commented 7 years ago

What is the purpose of this Pull Request? Using alpine as base image. Generate a smaller docker image. From 707 M to 381M Add linter to Dockerfile

What was done to achieve this purpose? Change the order of pip install

How to test if it really works? docker build -t rosie . docker run --rm -v /tmp/serenata-data:/tmp/serenata-data rosie test

Who can help reviewing it? Anyone with docker

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 98.045% when pulling 58ccad07510124d60bae87a742519a030c4fabaa on caduvieira:docker_alpine into c599dc73298fdc5c1ceb4077fa1553c75358e90c on datasciencebr:master.

cuducos commented 7 years ago

The other is the Dockefile linter: how does it work, can you give us more information on that, on the advantages of using it?

Hell yeah, @caduvieira — just did my homework and Dockerfile linters are awesome : ) Just a minor change in requirements.txt and we're good to go!

anaschwendler commented 7 years ago

Hi @caduvieira and @cuducos. I've just merged the smaller docker image (I was imagining that was better after this PR). I'll test this one locally, and give you a better feedback :)

anaschwendler commented 7 years ago

So, me again here:

For me it is working ok @cuducos:

```console ➜ rosie git:(caduvieira-docker_alpine) docker build -t rosie . Sending build context to Docker daemon 467.5kB Step 1/9 : FROM alpine:3.6 3.6: Pulling from library/alpine b56ae66c2937: Pull complete Digest: sha256:b40e202395eaec699f2d0c5e01e6d6cb8e6b57d77c0e0221600cf0b5940cf3ab Status: Downloaded newer image for alpine:3.6 ---> 37eec16f1872 Step 2/9 : LABEL maintainer "https://github.com/datasciencebr/rosie" ---> Running in b822360e12a1 ---> 70afa33d3278 Removing intermediate container b822360e12a1 Step 3/9 : COPY requirements.txt ./ ---> fc5b807ad2b6 Step 4/9 : COPY setup ./ ---> 086cf675f2a9 Step 5/9 : COPY rosie.py ./ ---> 5ad42b0c78ba Step 6/9 : COPY rosie ./rosie ---> 2f61dea68771 Step 7/9 : COPY config.ini.example ./ ---> e8f7feb43826 Step 8/9 : RUN apk add --no-cache python3 libstdc++ lapack && python3 -m ensurepip && rm -r /usr/lib/python*/ensurepip && pip3 install --upgrade pip setuptools && if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi && apk add --no-cache --virtual=.build-dependencies g++ gfortran musl-dev lapack-dev python3-dev ca-certificates libxslt-dev libxml2-dev && ln -s locale.h /usr/include/xlocale.h && ln -s /usr/bin/python3 /usr/bin/python && ./setup && find /usr/lib/python3.*/ -name 'tests' -exec rm -r '{}' + && rm /usr/include/xlocale.h && rm -r /root/.cache && apk del --purge .build-dependencies ---> Running in b468b7617854 fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz (1/16) Installing libgcc (6.3.0-r4) (2/16) Installing libquadmath (6.3.0-r4) (3/16) Installing libgfortran (6.3.0-r4) (4/16) Installing lapack (3.7.0-r0) (5/16) Installing libstdc++ (6.3.0-r4) (6/16) Installing libbz2 (1.0.6-r5) (7/16) Installing expat (2.2.0-r1) (8/16) Installing libffi (3.2.1-r3) (9/16) Installing gdbm (1.12-r0) (10/16) Installing xz-libs (5.2.3-r0) (11/16) Installing ncurses-terminfo-base (6.0_p20170930-r0) (12/16) Installing ncurses-terminfo (6.0_p20170930-r0) (13/16) Installing ncurses-libs (6.0_p20170930-r0) (14/16) Installing readline (6.3.008-r5) (15/16) Installing sqlite-libs (3.20.1-r0) (16/16) Installing python3 (3.6.1-r3) Executing busybox-1.26.2-r7.trigger OK: 81 MiB in 27 packages Requirement already satisfied: setuptools in /usr/lib/python3.6/site-packages Requirement already satisfied: pip in /usr/lib/python3.6/site-packages Requirement already up-to-date: pip in /usr/lib/python3.6/site-packages Collecting setuptools Downloading setuptools-36.6.0-py2.py3-none-any.whl (481kB) Installing collected packages: setuptools Found existing installation: setuptools 28.8.0 Uninstalling setuptools-28.8.0: Successfully uninstalled setuptools-28.8.0 Successfully installed setuptools-36.6.0 fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz (1/25) Installing binutils-libs (2.28-r3) (2/25) Installing binutils (2.28-r3) (3/25) Installing gmp (6.1.2-r0) (4/25) Installing isl (0.17.1-r0) (5/25) Installing libgomp (6.3.0-r4) (6/25) Installing libatomic (6.3.0-r4) (7/25) Installing pkgconf (1.3.7-r0) (8/25) Installing mpfr3 (3.1.5-r0) (9/25) Installing mpc1 (1.0.3-r0) (10/25) Installing gcc (6.3.0-r4) (11/25) Installing musl-dev (1.1.16-r14) (12/25) Installing libc-dev (0.7.1-r0) (13/25) Installing g++ (6.3.0-r4) (14/25) Installing gfortran (6.3.0-r4) (15/25) Installing lapack-dev (3.7.0-r0) (16/25) Installing python3-dev (3.6.1-r3) (17/25) Installing ca-certificates (20161130-r2) (18/25) Installing libgpg-error (1.27-r0) (19/25) Installing libgcrypt (1.7.9-r0) (20/25) Installing libxml2 (2.9.4-r4) (21/25) Installing libxslt (1.1.29-r3) (22/25) Installing zlib-dev (1.2.11-r0) (23/25) Installing libxml2-dev (2.9.4-r4) (24/25) Installing libxslt-dev (1.1.29-r3) (25/25) Installing .build-dependencies (0) Executing busybox-1.26.2-r7.trigger Executing ca-certificates-20161130-r2.trigger OK: 275 MiB in 52 packages Collecting geopy>=1.11.0 (from -r requirements.txt (line 1)) Downloading geopy-1.11.0-py2.py3-none-any.whl (66kB) Collecting numpy==1.13.1 (from -r requirements.txt (line 2)) Downloading numpy-1.13.1.zip (5.0MB) Collecting scipy==0.19.0 (from -r requirements.txt (line 3)) Downloading scipy-0.19.0.zip (15.3MB) Collecting pycpfcnpj==1.0.2 (from -r requirements.txt (line 4)) Downloading pycpfcnpj-1.0.2.tar.gz Collecting scikit-learn==0.18.1 (from -r requirements.txt (line 5)) Downloading scikit-learn-0.18.1.tar.gz (8.9MB) Collecting serenata-toolbox (from -r requirements.txt (line 6)) Downloading serenata_toolbox-12.2.2-py3-none-any.whl Collecting tqdm (from serenata-toolbox->-r requirements.txt (line 6)) Downloading tqdm-4.19.4-py2.py3-none-any.whl (50kB) Collecting lxml>=3.6 (from serenata-toolbox->-r requirements.txt (line 6)) Downloading lxml-4.1.0.tar.gz (4.2MB) Collecting pandas>=0.18 (from serenata-toolbox->-r requirements.txt (line 6)) Downloading pandas-0.21.0.tar.gz (11.3MB) Collecting boto3 (from serenata-toolbox->-r requirements.txt (line 6)) Downloading boto3-1.4.7-py2.py3-none-any.whl (128kB) Collecting aiofiles (from serenata-toolbox->-r requirements.txt (line 6)) Downloading aiofiles-0.3.2-py3-none-any.whl Collecting beautifulsoup4>=4.4 (from serenata-toolbox->-r requirements.txt (line 6)) Downloading beautifulsoup4-4.6.0-py3-none-any.whl (86kB) Collecting aiohttp (from serenata-toolbox->-r requirements.txt (line 6)) Downloading aiohttp-2.3.1.tar.gz (1.1MB) Collecting python-dateutil>=2 (from pandas>=0.18->serenata-toolbox->-r requirements.txt (line 6)) Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB) Collecting pytz>=2011k (from pandas>=0.18->serenata-toolbox->-r requirements.txt (line 6)) Downloading pytz-2017.3-py2.py3-none-any.whl (511kB) Collecting botocore<1.8.0,>=1.7.0 (from boto3->serenata-toolbox->-r requirements.txt (line 6)) Downloading botocore-1.7.36-py2.py3-none-any.whl (3.7MB) Collecting jmespath<1.0.0,>=0.7.1 (from boto3->serenata-toolbox->-r requirements.txt (line 6)) Downloading jmespath-0.9.3-py2.py3-none-any.whl Collecting s3transfer<0.2.0,>=0.1.10 (from boto3->serenata-toolbox->-r requirements.txt (line 6)) Downloading s3transfer-0.1.11-py2.py3-none-any.whl (54kB) Collecting chardet (from aiohttp->serenata-toolbox->-r requirements.txt (line 6)) Downloading chardet-3.0.4-py2.py3-none-any.whl (133kB) Collecting multidict>=3.0.0 (from aiohttp->serenata-toolbox->-r requirements.txt (line 6)) Downloading multidict-3.3.0.tar.gz Collecting async_timeout>=1.2.0 (from aiohttp->serenata-toolbox->-r requirements.txt (line 6)) Downloading async_timeout-2.0.0-py3-none-any.whl Collecting yarl>=0.11 (from aiohttp->serenata-toolbox->-r requirements.txt (line 6)) Downloading yarl-0.13.0.tar.gz (136kB) Collecting six>=1.5 (from python-dateutil>=2->pandas>=0.18->serenata-toolbox->-r requirements.txt (line 6)) Downloading six-1.11.0-py2.py3-none-any.whl Collecting docutils>=0.10 (from botocore<1.8.0,>=1.7.0->boto3->serenata-toolbox->-r requirements.txt (line 6)) Downloading docutils-0.14-py3-none-any.whl (543kB) Installing collected packages: geopy, numpy, scipy, pycpfcnpj, scikit-learn, tqdm, lxml, six, python-dateutil, pytz, pandas, jmespath, docutils, botocore, s3transfer, boto3, aiofiles, beautifulsoup4, chardet, multidict, async-timeout, yarl, aiohttp, serenata-toolbox Running setup.py install for numpy: started Running setup.py install for numpy: still running... Running setup.py install for numpy: still running... Running setup.py install for numpy: finished with status 'done' Running setup.py install for scipy: started Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: still running... Running setup.py install for scipy: finished with status 'done' Running setup.py install for pycpfcnpj: started Running setup.py install for pycpfcnpj: finished with status 'done' Running setup.py install for scikit-learn: started Running setup.py install for scikit-learn: still running... Running setup.py install for scikit-learn: still running... Running setup.py install for scikit-learn: finished with status 'done' Running setup.py install for lxml: started Running setup.py install for lxml: still running... Running setup.py install for lxml: finished with status 'done' Running setup.py install for pandas: started Running setup.py install for pandas: still running... Running setup.py install for pandas: still running... Running setup.py install for pandas: still running... Running setup.py install for pandas: still running... Running setup.py install for pandas: still running... Running setup.py install for pandas: finished with status 'done' Running setup.py install for multidict: started Running setup.py install for multidict: finished with status 'done' Running setup.py install for yarl: started Running setup.py install for yarl: finished with status 'done' Running setup.py install for aiohttp: started Running setup.py install for aiohttp: finished with status 'done' Successfully installed aiofiles-0.3.2 aiohttp-2.3.1 async-timeout-2.0.0 beautifulsoup4-4.6.0 boto3-1.4.7 botocore-1.7.36 chardet-3.0.4 docutils-0.14 geopy-1.11.0 jmespath-0.9.3 lxml-4.1.0 multidict-3.3.0 numpy-1.13.1 pandas-0.21.0 pycpfcnpj-1.0.2 python-dateutil-2.6.1 pytz-2017.3 s3transfer-0.1.11 scikit-learn-0.18.1 scipy-0.19.0 serenata-toolbox-12.2.2 six-1.11.0 tqdm-4.19.4 yarl-0.13.0 WARNING: Ignoring APKINDEX.84815163.tar.gz: No such file or directory WARNING: Ignoring APKINDEX.24d64ab1.tar.gz: No such file or directory (1/25) Purging .build-dependencies (0) (2/25) Purging g++ (6.3.0-r4) (3/25) Purging libc-dev (0.7.1-r0) (4/25) Purging gfortran (6.3.0-r4) (5/25) Purging gcc (6.3.0-r4) (6/25) Purging binutils (2.28-r3) (7/25) Purging libatomic (6.3.0-r4) (8/25) Purging libgomp (6.3.0-r4) (9/25) Purging musl-dev (1.1.16-r14) (10/25) Purging lapack-dev (3.7.0-r0) (11/25) Purging python3-dev (3.6.1-r3) (12/25) Purging ca-certificates (20161130-r2) Executing ca-certificates-20161130-r2.post-deinstall (13/25) Purging libxslt-dev (1.1.29-r3) (14/25) Purging libxslt (1.1.29-r3) (15/25) Purging libxml2-dev (2.9.4-r4) (16/25) Purging zlib-dev (1.2.11-r0) (17/25) Purging libxml2 (2.9.4-r4) (18/25) Purging binutils-libs (2.28-r3) (19/25) Purging mpc1 (1.0.3-r0) (20/25) Purging mpfr3 (3.1.5-r0) (21/25) Purging isl (0.17.1-r0) (22/25) Purging gmp (6.1.2-r0) (23/25) Purging pkgconf (1.3.7-r0) (24/25) Purging libgcrypt (1.7.9-r0) (25/25) Purging libgpg-error (1.27-r0) Executing busybox-1.26.2-r7.trigger OK: 81 MiB in 27 packages ---> 06ce7a17a7aa Removing intermediate container b468b7617854 Step 9/9 : ENTRYPOINT python rosie.py ---> Running in 73cf659df460 ---> 6b28182e950a Removing intermediate container 73cf659df460 Successfully built 6b28182e950a Successfully tagged rosie:latest ```

and the tests:

$ docker run --rm -v /tmp/serenata-data:/tmp/serenata-data rosie test
➜  rosie git:(caduvieira-docker_alpine) docker run --rm -v /tmp/serenata-data:/tmp/serenata-data rosie test
......................................................................
----------------------------------------------------------------------
Ran 70 tests in 2.768s

OK

@cuducos can you explain more to me about the problem?

caduvieira commented 7 years ago

@cuducos I couldn't reproduce that error

lipemorais commented 7 years ago

It took a long time to build the image but ran smoothly here too. :)

lipemorais commented 7 years ago

Hey @cuducos What do you think have this image builded by Docker hub?

lipemorais commented 7 years ago

Hey @cuducos What do you think have this image builded by Docker hub?

Looks that it's already done here so the part about how to run this image could be just the second line with a small modification on README.md something like docker run --rm -v /tmp/serenata-data:/tmp/serenata-data datasciencebr/rosie test since Docker downloads the image if it's not there yet.

@caduvieira @cuducos @anaschwendler What do you think of this?

cuducos commented 7 years ago

Ok, just re-ran the build and it worked — probably just some minor internet instability or whatever. I'm sorry about that.

I think we need small tweaks before I'm able to merge it:

During the scikit-learn wheel build it was failing because of "missing scipy"

So this is an intended exception. Could you document it adding a comment in the requirements.txt file? (BTW I tested sorting it and indeed it fails! Thanks for the catch, @caduvieira!)

a small modification on README.md something like docker run --rm -v /tmp/serenata-data:/tmp/serenata-data datasciencebr/rosie test since Docker downloads the image if it's not there yet.

And this edit suggested by @lipemorais, since people can use the image from Docker Hub instead of building theirs themselves ; )

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 98.045% when pulling 00129b7e24b784dcd188f93f8a72aa54a2163afd on caduvieira:docker_alpine into c599dc73298fdc5c1ceb4077fa1553c75358e90c on datasciencebr:master.

caduvieira commented 7 years ago

Done and rebased

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 98.045% when pulling cf817242c4e5d886fd3b60fdc0b80e8ab8659533 on caduvieira:docker_alpine into 49c069d674f3399f9b69c5c4c6b0ef81cd4fb21c on datasciencebr:master.

lipemorais commented 7 years ago

@caduvieira, may in the future we could split the new docker file based on Alpine and the docker file linter improvement in two different PR, so it could be merged even if the other one is not able to be merged yet. ;)

anaschwendler commented 7 years ago

Last review!

  1. Clone the project:

    $ git clone git@github.com:datasciencebr/rosie.git
  2. Change to Rosie' folder:

    $ cd rosie
  3. Change to the tested branch:

    $ git checkout -b caduvieira-docker_alpine master
  4. Merge its content:

    $ git pull https://github.com/caduvieira/rosie.git docker_alpine
  5. Run the new Docker commands:

    $ docker build -t rosie .
    $ docker run --rm -v /tmp/serenata-data:/tmp/serenata-data rosie test

The result:

➜  rosie git:(caduvieira-docker_alpine) docker run --rm -v /tmp/serenata-data:/tmp/serenata-data rosie test
......................................................................
----------------------------------------------------------------------
Ran 70 tests in 3.189s

OK

Done!