typesense / typesense-docsearch-scraper

A fork of Algolia's awesome DocSearch Scraper, customized to index data in Typesense (an open source alternative to Algolia)
https://typesense.org/docs/guide/docsearch.html
Other
95 stars 35 forks source link

feat: multi-arch #58

Open darkweaver87 opened 4 months ago

darkweaver87 commented 4 months ago

Change Summary

This PR uses docker multi-stage feature and and multi-arch build. It also updates to python 3.11 (version on debian) and use chromium instead of chrome to support multi-arch easily. Fixes https://github.com/typesense/typesense-docsearch-scraper/pull/58.

PR Checklist

darkweaver87 commented 2 weeks ago

Sorry for late answer @jasonbosco. I can do it but I don't get your point actually :-) The dockerfile is a multi-stage one. So each stage can be built separately if needed and a change in the code won't trigger a full rebuild of the base image and you can even specify which target you want to build. Example building test image:

$ docker buildx build -t typesense-docsearch-scraper:latest --platform=linux/amd64 --load . -f scraper/dev/docker/Dockerfile --target test
[+] Building 215.9s (31/31) FINISHED                                                                                                                                                 docker-container:nifty_pascal
 => [internal] booting buildkit                                                                                                                                                                              13.5s
 => => pulling image moby/buildkit:buildx-stable-1                                                                                                                                                           12.4s
 => => creating container buildx_buildkit_nifty_pascal0                                                                                                                                                       1.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.0s
 => => transferring dockerfile: 1.79kB                                                                                                                                                                        0.0s
 => resolve image config for docker-image://docker.io/docker/dockerfile:1.4                                                                                                                                   1.2s
 => [auth] docker/dockerfile:pull token for registry-1.docker.io                                                                                                                                              0.0s
 => docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                                                    1.2s
 => => resolve docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                                                        0.0s
 => => sha256:1328b32c40fca9bcf9d70d8eccb72eb873d1124d72dadce04db8badbe7b08546 9.94MB / 9.94MB                                                                                                                1.0s
 => => extracting sha256:1328b32c40fca9bcf9d70d8eccb72eb873d1124d72dadce04db8badbe7b08546                                                                                                                     0.1s
 => [internal] load .dockerignore                                                                                                                                                                             0.0s
 => => transferring context: 267B                                                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/debian:12-slim                                                                                                                                             1.4s
 => [auth] library/debian:pull token for registry-1.docker.io                                                                                                                                                 0.0s
 => [internal] load build context                                                                                                                                                                             0.1s
 => => transferring context: 1.86MB                                                                                                                                                                           0.0s
 => [base  1/17] FROM docker.io/library/debian:12-slim@sha256:f528891ab1aa484bf7233dbcc84f3c806c3e427571d75510a9d74bb5ec535b33                                                                                3.3s
 => => resolve docker.io/library/debian:12-slim@sha256:f528891ab1aa484bf7233dbcc84f3c806c3e427571d75510a9d74bb5ec535b33                                                                                       0.0s
 => => sha256:f11c1adaa26e078479ccdd45312ea3b88476441b91be0ec898a7e07bfd05badc 29.13MB / 29.13MB                                                                                                              2.7s
 => => extracting sha256:f11c1adaa26e078479ccdd45312ea3b88476441b91be0ec898a7e07bfd05badc                                                                                                                     0.5s
 => [base  2/17] RUN useradd -d /home/seleuser -m seleuser                                                                                                                                                    0.2s
 => [base  3/17] RUN chown -R seleuser /home/seleuser                                                                                                                                                         0.1s
 => [base  4/17] RUN chgrp -R seleuser /home/seleuser                                                                                                                                                         0.1s
 => [base  5/17] WORKDIR /home/seleuser                                                                                                                                                                       0.0s
 => [base  6/17] RUN apt-get update -y && apt-get install -yq     software-properties-common    python3                                                                                                      23.2s
 => [base  7/17] RUN apt-get update -y && apt-get install -yq     curl     wget     sudo     gnupg     && curl -sL https://deb.nodesource.com/setup_18.x | sudo bash -                                        8.1s 
 => [base  8/17] RUN apt-get update -y && apt-get install -y     nodejs                                                                                                                                       5.8s 
 => [base  9/17] RUN apt-get update -y && apt-get install -yq   unzip   xvfb   libxi6   libgconf-2-4   default-jdk                                                                                           38.3s 
 => [base 10/17] RUN apt-get update -y && apt-get install -yq   chromium-driver                                                                                                                              29.7s 
 => [base 11/17] RUN wget -q https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.4.0/selenium-server-4.4.0.jar                                                                                3.0s 
 => [base 12/17] RUN wget -q https://repo1.maven.org/maven2/org/testng/testng/7.6.1/testng-7.6.1.jar                                                                                                          0.4s 
 => [base 13/17] COPY Pipfile .                                                                                                                                                                               0.1s 
 => [base 14/17] COPY Pipfile.lock .                                                                                                                                                                          0.1s 
 => [base 15/17] RUN apt-get update -y && apt-get install -yq     python3-pip                                                                                                                                20.9s 
 => [base 16/17] RUN pip3 install pipenv --break-system-packages                                                                                                                                              4.8s 
 => [base 17/17] RUN pipenv sync --python 3.11                                                                                                                                                               17.5s 
 => [test 1/3] WORKDIR /home/seleuser                                                                                                                                                                         0.1s 
 => [test 2/3] COPY . .                                                                                                                                                                                       0.1s 
 => [test 3/3] RUN touch .env                                                                                                                                                                                 0.1s 
 => exporting to docker image format                                                                                                                                                                         42.4s 
 => => exporting layers                                                                                                                                                                                      22.1s 
 => => exporting manifest sha256:47f5650797ce0c30a35d82381a955e8328bed8ca12a0b25974cf4bc151ed91e4                                                                                                             0.0s 
 => => exporting config sha256:e345418620cb7a94a7ecd1fb6e7c1e908e3626ef85b070d58ed6406410b4eae4                                                                                                               0.0s
 => => sending tarball                                                                                                                                                                                       20.3s
 => importing to docker                                                                                                                                                                                      15.9s
 => => loading layer 32148f9f6c5a 294.91kB / 29.13MB                                                                                                                                                         15.9s
 => => loading layer 408aeabd8ec4 3.31kB / 3.31kB                                                                                                                                                            15.1s
 => => loading layer 5f70bf18a086 32B / 32B                                                                                                                                                                  15.1s
 => => loading layer b3f64d5dd689 557.06kB / 68.94MB                                                                                                                                                         14.9s
 => => loading layer 0fa4a649ac15 131.07kB / 11.15MB                                                                                                                                                         13.6s
 => => loading layer 6a94ac0622d7 458.75kB / 45.35MB                                                                                                                                                         13.5s
 => => loading layer ea92554603cd 191.07MB / 238.09MB                                                                                                                                                        12.3s
 => => loading layer 1c59b50d0aed 155.42MB / 174.49MB                                                                                                                                                         8.5s
 => => loading layer c5a49a44b5a9 229.38kB / 21.94MB                                                                                                                                                          5.1s
 => => loading layer d7632a142344 32.77kB / 921.11kB                                                                                                                                                          4.8s
 => => loading layer 50d7f986270c 450B / 450B                                                                                                                                                                 4.7s
 => => loading layer d0a6277ae75e 20.11kB / 20.11kB                                                                                                                                                           4.7s
 => => loading layer 093373cbd025 557.06kB / 98.49MB                                                                                                                                                          4.6s
 => => loading layer 56623155e221 294.91kB / 26.72MB                                                                                                                                                          2.7s
 => => loading layer a26fa1e42ce8 557.06kB / 85.80MB                                                                                                                                                          2.1s
 => => loading layer 193ccc1a9401 32.77kB / 1.19MB                                                                                                                                                            0.3s
 => => loading layer 6360e74fdf2a 154B / 154B     

now I change the code:

echo '# test' >> scraper/__init__.py
$ docker buildx build -t typesense-docsearch-scraper:latest --platform=linux/amd64 --load . -f scraper/dev/docker/Dockerfile --target test
[+] Building 4.9s (28/28) FINISHED                                                                                                                                                   docker-container:nifty_pascal
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.0s
 => => transferring dockerfile: 1.79kB                                                                                                                                                                        0.0s
 => resolve image config for docker-image://docker.io/docker/dockerfile:1.4                                                                                                                                   0.6s
 => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                                             0.0s
 => => resolve docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                                                        0.0s
 => [internal] load .dockerignore                                                                                                                                                                             0.0s
 => => transferring context: 267B                                                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/debian:12-slim                                                                                                                                             0.2s
 => [internal] load build context                                                                                                                                                                             0.0s
 => => transferring context: 22.46kB                                                                                                                                                                          0.0s
 => [base  1/17] FROM docker.io/library/debian:12-slim@sha256:f528891ab1aa484bf7233dbcc84f3c806c3e427571d75510a9d74bb5ec535b33                                                                                0.0s
 => => resolve docker.io/library/debian:12-slim@sha256:f528891ab1aa484bf7233dbcc84f3c806c3e427571d75510a9d74bb5ec535b33                                                                                       0.0s
 => CACHED [base  2/17] RUN useradd -d /home/seleuser -m seleuser                                                                                                                                             0.0s
 => CACHED [base  3/17] RUN chown -R seleuser /home/seleuser                                                                                                                                                  0.0s
 => CACHED [base  4/17] RUN chgrp -R seleuser /home/seleuser                                                                                                                                                  0.0s
 => CACHED [base  5/17] WORKDIR /home/seleuser                                                                                                                                                                0.0s
 => CACHED [base  6/17] RUN apt-get update -y && apt-get install -yq     software-properties-common    python3                                                                                                0.0s
 => CACHED [base  7/17] RUN apt-get update -y && apt-get install -yq     curl     wget     sudo     gnupg     && curl -sL https://deb.nodesource.com/setup_18.x | sudo bash -                                 0.0s
 => CACHED [base  8/17] RUN apt-get update -y && apt-get install -y     nodejs                                                                                                                                0.0s
 => CACHED [base  9/17] RUN apt-get update -y && apt-get install -yq   unzip   xvfb   libxi6   libgconf-2-4   default-jdk                                                                                     0.0s
 => CACHED [base 10/17] RUN apt-get update -y && apt-get install -yq   chromium-driver                                                                                                                        0.0s
 => CACHED [base 11/17] RUN wget -q https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.4.0/selenium-server-4.4.0.jar                                                                         0.0s
 => CACHED [base 12/17] RUN wget -q https://repo1.maven.org/maven2/org/testng/testng/7.6.1/testng-7.6.1.jar                                                                                                   0.0s
 => CACHED [base 13/17] COPY Pipfile .                                                                                                                                                                        0.0s
 => CACHED [base 14/17] COPY Pipfile.lock .                                                                                                                                                                   0.0s
 => CACHED [base 15/17] RUN apt-get update -y && apt-get install -yq     python3-pip                                                                                                                          0.0s
 => CACHED [base 16/17] RUN pip3 install pipenv --break-system-packages                                                                                                                                       0.0s
 => CACHED [base 17/17] RUN pipenv sync --python 3.11                                                                                                                                                         0.0s
 => CACHED [test 1/3] WORKDIR /home/seleuser                                                                                                                                                                  0.0s
 => [test 2/3] COPY . .                                                                                                                                                                                       0.0s
 => [test 3/3] RUN touch .env                                                                                                                                                                                 0.1s
 => exporting to docker image format                                                                                                                                                                          3.8s
 => => exporting layers                                                                                                                                                                                       0.1s
 => => exporting manifest sha256:73f7dc3b3c1d8a32737692ed785e7b5c18fdc6f671f9a46549f4f420d089d125                                                                                                             0.0s
 => => exporting config sha256:3bc3081e931ed24936f3920a68700fdd27a4f558421d120c20ca6498d20e85fa                                                                                                               0.0s
 => => sending tarball                                                                                                                                                                                        3.6s
 => importing to docker                                                                                                                                                                                       0.2s
 => => loading layer f4a7588e8c11 32.77kB / 1.19MB                                                                                                                                                            0.2s
 => => loading layer 577e67c906ab 154B / 154B                                                                           

Look at CACHED lines and build time :-)