mamba-org / mamba

The Fast Cross-Platform Package Manager
https://mamba.readthedocs.io
BSD 3-Clause "New" or "Revised" License
6.91k stars 357 forks source link

Docker image size compared to alternative package manager pip #2390

Open LuchiLucs opened 1 year ago

LuchiLucs commented 1 year ago

Troubleshooting docs

Search tried in issue tracker

docke image size compared to pip

Latest version of Mamba

Tried in Conda?

Reproducible with Conda

Describe your issue

I have successfully built two docker images of the same Python application, where I have a requirements.txt/enviroment.yml file containing the runtime depedencies. The first solution uses apt-get to install system requirerements and then pip to install the requirements of the application. The second solution uses just micromamba to resolve requirements and related depedencies. The first solution takes around 790MB in size, if I compile numpy and scipy from source, avoiding to bundle two copies of openblas library, for instance, I have save up another 40MB in size, resulting to a image of 750MB. The second solution uses micromamba and results in 2.2GB in size. Both solutions clear caches of installed packages (e.g. apt-get cache, pip cache, micromamba cache). What could cause this difference? With the first solution I use a multi-stage build in order to install requirements packages and then copy just those over the second stage of the image. Maybe micromamba bundle together building and runtime deps?

Would you mind helping me understanding how to proceded further and how to resolve this problem? I would like to use micromamba and still have similar sizes for the two docker images.

mamba info / micromamba info

environment : base (active)
           env location : /opt/conda
      user config files : /home/mambauser/.mambarc
 populated config files :
       libmamba version : 1.3.1
     micromamba version : 1.3.1
           curl version : libcurl/7.87.0 OpenSSL/3.0.8 zlib/1.2.13 libssh2/1.10.0 nghttp2/1.47.0
     libarchive version : libarchive 3.6.2 zlib/1.2.13 bz2lib/1.0.8 libzstd/1.5.2
       virtual packages : __unix=0=0
                          __linux=5.15.90=0
                          __glibc=2.31=0
                          __archspec=1=x86_64
               channels :
       base environment : /opt/conda
               platform : linux-64

Logs

$ micromamba create --dry-run -f enviroment.yaml -n testdryrun

                                           __
          __  ______ ___  ____ _____ ___  / /_  ____ _
         / / / / __ `__ \/ __ `/ __ `__ \/ __ \/ __ `/
        / /_/ / / / / / / /_/ / / / / / / /_/ / /_/ /
       / .___/_/ /_/ /_/\__,_/_/ /_/ /_/_.___/\__,_/
      /_/

conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache

Transaction

  Prefix: /opt/conda/envs/testdryrun

  Updating specs:

   - python=3.9
   - azure-storage-blob
   - azure-identity
   - pyodbc
   - psycopg2
   - numpy
   - scipy
   - statsmodels
   - pandas
   - pyarrow
   - scikit-learn
   - pmdarima
   - matplotlib
   - seaborn
   - requests

  Package                           Version  Build                Channel                   Size
──────────────────────────────────────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────────────────────────────────────

  + _libgcc_mutex                       0.1  conda_forge          conda-forge/linux-64       3kB
  + _openmp_mutex                       4.5  2_gnu                conda-forge/linux-64      24kB
  + alsa-lib                          1.2.8  h166bdaf_0           conda-forge/linux-64     592kB
  + arrow-cpp                        11.0.0  ha770c72_12_cpu      conda-forge/linux-64      31kB
  + attr                              2.5.1  h166bdaf_1           conda-forge/linux-64      71kB
  + aws-c-auth                       0.6.26  hdca2abe_0           conda-forge/linux-64      96kB
  + aws-c-cal                        0.5.21  h48707d8_2           conda-forge/linux-64      44kB
  + aws-c-common                     0.8.14  h0b41bf4_0           conda-forge/linux-64     200kB
  + aws-c-compression                0.2.16  h03acc5a_5           conda-forge/linux-64      19kB
  + aws-c-event-stream               0.2.20  h00877a2_4           conda-forge/linux-64      54kB
  + aws-c-http                        0.7.5  hf342b9f_5           conda-forge/linux-64     192kB
  + aws-c-io                        0.13.19  hef0810e_1           conda-forge/linux-64     143kB
  + aws-c-mqtt                        0.8.6  h337b09f_11          conda-forge/linux-64     143kB
  + aws-c-s3                          0.2.7  hde0a405_0           conda-forge/linux-64      76kB
  + aws-c-sdkutils                    0.1.8  h03acc5a_0           conda-forge/linux-64      54kB
  + aws-checksums                    0.1.14  h03acc5a_5           conda-forge/linux-64      50kB
  + aws-crt-cpp                      0.19.8  h85f3b07_11          conda-forge/linux-64     319kB
  + aws-sdk-cpp                     1.10.57  h17c43bd_8           conda-forge/linux-64       4MB
  + azure-core                       1.26.3  pyhd8ed1ab_0         conda-forge/noarch        90kB
  + azure-identity                   1.12.0  pyhd8ed1ab_0         conda-forge/noarch        68kB
  + azure-storage-blob              12.15.0  pyhd8ed1ab_0         conda-forge/noarch       179kB
  + blinker                             1.5  pyhd8ed1ab_0         conda-forge/noarch        15kB
  + brotli                            1.0.9  h166bdaf_8           conda-forge/linux-64      19kB
  + brotli-bin                        1.0.9  h166bdaf_8           conda-forge/linux-64      20kB
  + brotlipy                          0.7.0  py39hb9d737c_1005    conda-forge/linux-64     351kB
  + bzip2                             1.0.8  h7f98852_4           conda-forge/linux-64     496kB
  + c-ares                           1.18.1  h7f98852_0           conda-forge/linux-64     115kB
  + ca-certificates               2022.12.7  ha878542_0           conda-forge/linux-64     146kB
  + cairo                            1.16.0  ha61ee94_1014        conda-forge/linux-64       2MB
  + certifi                       2022.12.7  pyhd8ed1ab_0         conda-forge/noarch       151kB
  + cffi                             1.15.1  py39he91dace_3       conda-forge/linux-64     235kB
  + charset-normalizer                2.1.1  pyhd8ed1ab_0         conda-forge/noarch        36kB
  + contourpy                         1.0.7  py39h4b4f3f3_0       conda-forge/linux-64     216kB
  + cryptography                     39.0.2  py39h079d5ae_0       conda-forge/linux-64       1MB
  + cycler                           0.11.0  pyhd8ed1ab_0         conda-forge/noarch        10kB
  + cython                          0.29.33  py39h227be39_0       conda-forge/linux-64       2MB
  + dbus                             1.13.6  h5008d03_3           conda-forge/linux-64     619kB
  + expat                             2.5.0  h27087fc_0           conda-forge/linux-64     194kB
  + fftw                             3.3.10  nompi_hf0379b8_106   conda-forge/linux-64       2MB
  + font-ttf-dejavu-sans-mono          2.37  hab24e00_0           conda-forge/noarch       397kB
  + font-ttf-inconsolata              3.000  h77eed37_0           conda-forge/noarch        97kB
  + font-ttf-source-code-pro          2.038  h77eed37_0           conda-forge/noarch       701kB
  + font-ttf-ubuntu                    0.83  hab24e00_0           conda-forge/noarch         2MB
  + fontconfig                       2.14.2  h14ed4e7_0           conda-forge/linux-64     272kB
  + fonts-conda-ecosystem                 1  0                    conda-forge/noarch         4kB
  + fonts-conda-forge                     1  0                    conda-forge/noarch         4kB
  + fonttools                        4.39.2  py39h72bdee0_0       conda-forge/linux-64       2MB
  + freetype                         2.12.1  hca18f0e_1           conda-forge/linux-64     626kB
  + gettext                          0.21.1  h27087fc_0           conda-forge/linux-64       4MB
  + gflags                            2.2.2  he1b5a44_1004        conda-forge/linux-64     117kB
  + glib                             2.74.1  h6239696_1           conda-forge/linux-64     486kB
  + glib-tools                       2.74.1  h6239696_1           conda-forge/linux-64     109kB
  + glog                              0.6.0  h6f12383_0           conda-forge/linux-64     114kB
  + graphite2                        1.3.13  h58526e2_1001        conda-forge/linux-64     105kB
  + gst-plugins-base                 1.22.0  h4243ec0_2           conda-forge/linux-64       3MB
  + gstreamer                        1.22.0  h25f0c4b_2           conda-forge/linux-64       2MB
  + gstreamer-orc                    0.4.33  h166bdaf_0           conda-forge/linux-64     306kB
  + harfbuzz                          6.0.0  h8e241bc_0           conda-forge/linux-64       1MB
  + icu                                70.1  h27087fc_0           conda-forge/linux-64      14MB
  + idna                                3.4  pyhd8ed1ab_0         conda-forge/noarch        57kB
  + importlib-resources              5.12.0  pyhd8ed1ab_0         conda-forge/noarch         9kB
  + importlib_resources              5.12.0  pyhd8ed1ab_0         conda-forge/noarch        31kB
  + isodate                           0.6.1  pyhd8ed1ab_0         conda-forge/noarch        29kB
  + jack                             1.9.22  h11f4161_0           conda-forge/linux-64     464kB
  + joblib                            1.2.0  pyhd8ed1ab_0         conda-forge/noarch       210kB
  + keyutils                          1.6.1  h166bdaf_0           conda-forge/linux-64     118kB
  + kiwisolver                        1.4.4  py39hf939315_1       conda-forge/linux-64      78kB
  + krb5                             1.20.1  h81ceb04_0           conda-forge/linux-64       1MB
  + lame                              3.100  h166bdaf_1003        conda-forge/linux-64     508kB
  + lcms2                              2.15  haa2dc70_1           conda-forge/linux-64     242kB
  + ld_impl_linux-64                   2.40  h41732ed_0           conda-forge/linux-64     705kB
  + lerc                              4.0.0  h27087fc_0           conda-forge/linux-64     282kB
  + libabseil                    20230125.0  cxx17_hcb278e6_1     conda-forge/linux-64       1MB
  + libarrow                         11.0.0  h51ec05e_12_cpu      conda-forge/linux-64      27MB
  + libblas                           3.9.0  16_linux64_openblas  conda-forge/linux-64      13kB
  + libbrotlicommon                   1.0.9  h166bdaf_8           conda-forge/linux-64      67kB
  + libbrotlidec                      1.0.9  h166bdaf_8           conda-forge/linux-64      34kB
  + libbrotlienc                      1.0.9  h166bdaf_8           conda-forge/linux-64     295kB
  + libcap                             2.66  ha37c62d_0           conda-forge/linux-64     100kB
  + libcblas                          3.9.0  16_linux64_openblas  conda-forge/linux-64      13kB
  + libclang                         15.0.7  default_had23c3d_1   conda-forge/linux-64     133kB
  + libclang13                       15.0.7  default_h3e3d535_1   conda-forge/linux-64      10MB
  + libcrc32c                         1.1.2  h9c3ff4c_0           conda-forge/linux-64      20kB
  + libcups                           2.3.3  h36d4200_3           conda-forge/linux-64       5MB
  + libcurl                          7.88.1  hdc1c0ab_0           conda-forge/linux-64     358kB
  + libdb                            6.2.32  h9c3ff4c_0           conda-forge/linux-64      24MB
  + libdeflate                         1.17  h0b41bf4_0           conda-forge/linux-64      65kB
  + libedit                    3.1.20191231  he28a2e2_2           conda-forge/linux-64     124kB
  + libev                              4.33  h516909a_1           conda-forge/linux-64     106kB
  + libevent                         2.1.10  h28343ad_4           conda-forge/linux-64       1MB
  + libffi                            3.4.2  h7f98852_5           conda-forge/linux-64      58kB
  + libflac                           1.4.2  h27087fc_0           conda-forge/linux-64     421kB
  + libgcc-ng                        12.2.0  h65d4601_19          conda-forge/linux-64     954kB
  + libgcrypt                        1.10.1  h166bdaf_0           conda-forge/linux-64     720kB
  + libgfortran-ng                   12.2.0  h69a702a_19          conda-forge/linux-64      23kB
  + libgfortran5                     12.2.0  h337968e_19          conda-forge/linux-64       2MB
  + libglib                          2.74.1  h606061b_1           conda-forge/linux-64       3MB
  + libgomp                          12.2.0  h65d4601_19          conda-forge/linux-64     466kB
  + libgoogle-cloud                   2.8.0  h0bc5f78_1           conda-forge/linux-64      38MB
  + libgpg-error                       1.46  h620e276_0           conda-forge/linux-64     258kB
  + libgrpc                          1.52.1  hcf146ea_1           conda-forge/linux-64       6MB
  + libiconv                           1.17  h166bdaf_0           conda-forge/linux-64       1MB
  + libjpeg-turbo                   2.1.5.1  h0b41bf4_0           conda-forge/linux-64     491kB
  + liblapack                         3.9.0  16_linux64_openblas  conda-forge/linux-64      13kB
  + libllvm15                        15.0.7  hadd5161_1           conda-forge/linux-64      33MB
  + libnghttp2                       1.52.0  h61bc06f_0           conda-forge/linux-64     622kB
  + libnsl                            2.0.0  h7f98852_0           conda-forge/linux-64      31kB
  + libogg                            1.3.4  h7f98852_1           conda-forge/linux-64     211kB
  + libopenblas                      0.3.21  pthreads_h78a6416_3  conda-forge/linux-64      11MB
  + libopus                           1.3.1  h7f98852_1           conda-forge/linux-64     261kB
  + libpng                           1.6.39  h753d276_0           conda-forge/linux-64     283kB
  + libpq                              15.2  hb675445_0           conda-forge/linux-64       2MB
  + libprotobuf                     3.21.12  h3eb15da_0           conda-forge/linux-64       2MB
  + libsndfile                        1.2.0  hb75c966_0           conda-forge/linux-64     350kB
  + libsqlite                        3.40.0  h753d276_0           conda-forge/linux-64     810kB
  + libssh2                          1.10.0  hf14f497_3           conda-forge/linux-64     239kB
  + libstdcxx-ng                     12.2.0  h46fd767_19          conda-forge/linux-64       4MB
  + libsystemd0                         252  h2a991cd_0           conda-forge/linux-64     393kB
  + libthrift                        0.18.1  h5e4af38_0           conda-forge/linux-64       4MB
  + libtiff                           4.5.0  hddfeb54_5           conda-forge/linux-64     407kB
  + libtool                           2.4.7  h27087fc_0           conda-forge/linux-64     412kB
  + libudev1                            253  h0b41bf4_0           conda-forge/linux-64     119kB
  + libutf8proc                       2.8.0  h166bdaf_0           conda-forge/linux-64     101kB
  + libuuid                          2.32.1  h7f98852_1000        conda-forge/linux-64      28kB
  + libvorbis                         1.3.7  h9c3ff4c_0           conda-forge/linux-64     286kB
  + libwebp-base                      1.3.0  h0b41bf4_0           conda-forge/linux-64     357kB
  + libxcb                             1.13  h7f98852_1004        conda-forge/linux-64     400kB
  + libxkbcommon                      1.5.0  h79f4944_1           conda-forge/linux-64     563kB
  + libxml2                          2.10.3  hca2bb57_3           conda-forge/linux-64     714kB
  + libzlib                          1.2.13  h166bdaf_4           conda-forge/linux-64      66kB
  + lz4-c                             1.9.4  hcb278e6_0           conda-forge/linux-64     143kB
  + matplotlib                        3.7.1  py39hf3d152e_0       conda-forge/linux-64       8kB
  + matplotlib-base                   3.7.1  py39he190548_0       conda-forge/linux-64       7MB
  + mpg123                           1.31.2  hcb278e6_0           conda-forge/linux-64     485kB
  + msal                             1.21.0  pyhd8ed1ab_0         conda-forge/noarch        73kB
  + msal_extensions                   1.0.0  pyhd8ed1ab_0         conda-forge/noarch        20kB
  + msrest                            0.7.1  pyhd8ed1ab_0         conda-forge/noarch        52kB
  + munkres                           1.1.4  pyh9f0ad1d_0         conda-forge/noarch        12kB
  + mysql-common                     8.0.32  ha901b37_0           conda-forge/linux-64     744kB
  + mysql-libs                       8.0.32  hd7da12d_0           conda-forge/linux-64       2MB
  + ncurses                             6.3  h27087fc_1           conda-forge/linux-64       1MB
  + nspr                               4.35  h27087fc_0           conda-forge/linux-64     227kB
  + nss                                3.89  he45b914_0           conda-forge/linux-64       2MB
  + numpy                            1.24.2  py39h7360e5f_0       conda-forge/linux-64       7MB
  + oauthlib                          3.2.2  pyhd8ed1ab_0         conda-forge/noarch        92kB
  + openjpeg                          2.5.0  hfec8fc6_2           conda-forge/linux-64     352kB
  + openssl                           3.1.0  h0b41bf4_0           conda-forge/linux-64       3MB
  + orc                               1.8.3  hfdbbad2_0           conda-forge/linux-64     909kB
  + packaging                          23.0  pyhd8ed1ab_0         conda-forge/noarch        41kB
  + pandas                            1.5.3  py39h2ad29b5_0       conda-forge/linux-64      12MB
  + parquet-cpp                       1.5.1  2                    conda-forge/noarch         3kB
  + patsy                             0.5.3  pyhd8ed1ab_0         conda-forge/noarch       194kB
  + pcre2                             10.40  hc3806b6_0           conda-forge/linux-64       2MB
  + pillow                            9.4.0  py39h7207d5c_2       conda-forge/linux-64      46MB
  + pip                              23.0.1  pyhd8ed1ab_0         conda-forge/noarch         1MB
  + pixman                           0.40.0  h36c2ea0_0           conda-forge/linux-64     643kB
  + platformdirs                      3.1.1  pyhd8ed1ab_0         conda-forge/noarch        18kB
  + ply                                3.11  py_1                 conda-forge/noarch        45kB
  + pmdarima                          2.0.2  py39hb9d737c_1       conda-forge/linux-64     569kB
  + pooch                             1.7.0  pyhd8ed1ab_0         conda-forge/noarch        51kB
  + portalocker                       2.7.0  py39hf3d152e_0       conda-forge/linux-64      31kB
  + psycopg2                          2.9.3  py39h24a400a_2       conda-forge/linux-64     175kB
  + pthread-stubs                       0.4  h36c2ea0_1001        conda-forge/linux-64       6kB
  + pulseaudio                         16.1  ha8d29e2_1           conda-forge/linux-64       2MB
  + pyarrow                          11.0.0  py39hf0ef2fd_12_cpu  conda-forge/linux-64       4MB
  + pycparser                          2.21  pyhd8ed1ab_0         conda-forge/noarch       103kB
  + pyjwt                             2.6.0  pyhd8ed1ab_0         conda-forge/noarch        21kB
  + pyodbc                           4.0.35  py39h5a03fae_0       conda-forge/linux-64      79kB
  + pyopenssl                        23.0.0  pyhd8ed1ab_0         conda-forge/noarch       127kB
  + pyparsing                         3.0.9  pyhd8ed1ab_0         conda-forge/noarch        81kB
  + pyqt                             5.15.7  py39h5c7b992_3       conda-forge/linux-64       5MB
  + pyqt5-sip                       12.11.0  py39h227be39_3       conda-forge/linux-64      85kB
  + pysocks                           1.7.1  pyha2e5f31_6         conda-forge/noarch        19kB
  + python                           3.9.16  h2782a2a_0_cpython   conda-forge/linux-64      24MB
  + python-dateutil                   2.8.2  pyhd8ed1ab_0         conda-forge/noarch       246kB
  + python_abi                          3.9  3_cp39               conda-forge/linux-64       6kB
  + pytz                           2022.7.1  pyhd8ed1ab_0         conda-forge/noarch       186kB
  + qt-main                          5.15.8  h67dfc38_7           conda-forge/linux-64      52MB
  + re2                          2023.02.02  hcb278e6_0           conda-forge/linux-64     201kB
  + readline                          8.1.2  h0f457ee_0           conda-forge/linux-64     298kB
  + requests                         2.28.2  pyhd8ed1ab_0         conda-forge/noarch        57kB
  + requests-oauthlib                 1.3.1  pyhd8ed1ab_0         conda-forge/noarch        22kB
  + s2n                              1.3.39  h3358134_0           conda-forge/linux-64     362kB
  + scikit-learn                      1.2.2  py39h86b2a18_0       conda-forge/linux-64       8MB
  + scipy                            1.10.1  py39h7360e5f_0       conda-forge/linux-64      25MB
  + seaborn                          0.12.2  hd8ed1ab_0           conda-forge/noarch         6kB
  + seaborn-base                     0.12.2  pyhd8ed1ab_0         conda-forge/noarch       232kB
  + setuptools                       67.6.0  pyhd8ed1ab_0         conda-forge/noarch       579kB
  + sip                               6.7.7  py39h227be39_0       conda-forge/linux-64     488kB
  + six                              1.16.0  pyh6c4a22f_0         conda-forge/noarch        14kB
  + snappy                           1.1.10  h9fff704_0           conda-forge/linux-64      39kB
  + statsmodels                      0.13.5  py39h2ae25f5_2       conda-forge/linux-64      12MB
  + threadpoolctl                     3.1.0  pyh8a188c0_0         conda-forge/noarch        18kB
  + tk                               8.6.12  h27826a3_0           conda-forge/linux-64       3MB
  + toml                             0.10.2  pyhd8ed1ab_0         conda-forge/noarch        18kB
  + tornado                             6.2  py39hb9d737c_1       conda-forge/linux-64     674kB
  + typing-extensions                 4.5.0  hd8ed1ab_0           conda-forge/noarch        10kB
  + typing_extensions                 4.5.0  pyha770c72_0         conda-forge/noarch        31kB
  + tzdata                            2022g  h191b570_0           conda-forge/noarch       108kB
  + unicodedata2                     15.0.0  py39hb9d737c_0       conda-forge/linux-64     512kB
  + unixodbc                         2.3.10  h583eb01_0           conda-forge/linux-64     303kB
  + urllib3                         1.26.15  pyhd8ed1ab_0         conda-forge/noarch       113kB
  + wheel                            0.40.0  pyhd8ed1ab_0         conda-forge/noarch        56kB
  + xcb-util                          0.4.0  h166bdaf_0           conda-forge/linux-64      21kB
  + xcb-util-image                    0.4.0  h166bdaf_0           conda-forge/linux-64      24kB
  + xcb-util-keysyms                  0.4.0  h166bdaf_0           conda-forge/linux-64      12kB
  + xcb-util-renderutil               0.3.9  h166bdaf_0           conda-forge/linux-64      16kB
  + xcb-util-wm                       0.4.1  h166bdaf_0           conda-forge/linux-64      57kB
  + xkeyboard-config                   2.38  h0b41bf4_0           conda-forge/linux-64     882kB
  + xorg-kbproto                      1.0.7  h7f98852_1002        conda-forge/linux-64      27kB
  + xorg-libice                      1.0.10  h7f98852_0           conda-forge/linux-64      59kB
  + xorg-libsm                        1.2.3  hd9c2040_1000        conda-forge/linux-64      26kB
  + xorg-libx11                       1.8.4  h0b41bf4_0           conda-forge/linux-64     830kB
  + xorg-libxau                       1.0.9  h7f98852_0           conda-forge/linux-64      13kB
  + xorg-libxdmcp                     1.1.3  h7f98852_0           conda-forge/linux-64      19kB
  + xorg-libxext                      1.3.4  h0b41bf4_2           conda-forge/linux-64      50kB
  + xorg-libxrender                  0.9.10  h7f98852_1003        conda-forge/linux-64      33kB
  + xorg-renderproto                 0.11.1  h7f98852_1002        conda-forge/linux-64      10kB
  + xorg-xextproto                    7.3.0  h0b41bf4_1003        conda-forge/linux-64      30kB
  + xorg-xf86vidmodeproto             2.3.1  h7f98852_1002        conda-forge/linux-64      24kB
  + xorg-xproto                      7.0.31  h7f98852_1007        conda-forge/linux-64      75kB
  + xz                                5.2.6  h166bdaf_0           conda-forge/linux-64     418kB
  + zipp                             3.15.0  pyhd8ed1ab_0         conda-forge/noarch        17kB
  + zlib                             1.2.13  h166bdaf_4           conda-forge/linux-64      94kB
  + zstd                              1.5.2  h3eb15da_6           conda-forge/linux-64     420kB

  Summary:

  Install: 225 packages

  Total download: 469MB

──────────────────────────────────────────────────────────────────────────────────────────────────

environment.yml

name: base
channels:
  - conda-forge
  - nodefaults
dependencies:
  - python=3.9
  - azure-storage-blob
  - azure-identity
  - pyodbc
  - psycopg2
  - numpy
  - scipy
  - statsmodels
  - pandas
  - pyarrow
  - scikit-learn
  - pmdarima
  - matplotlib
  - seaborn
  - requests

~/.condarc

/home/mambauser/.condarc: No such file or directory

/home/mambauser/.mambarc: No such file or directory
jonashaag commented 1 year ago

Please post the Dockerfiles :)

LuchiLucs commented 1 year ago

Here there are the two dockerfiles: Using micromamba:

# # micromamba --help through docker:
# 1) docker run -it mambaorg/micromamba:1.3.1-bullseye-slim /bin/sh
# 2) $ micromamba --help
#   
#
ARG BASE_IMAGE=mambaorg/micromamba:1.3.1-bullseye-slim
FROM ${BASE_IMAGE}
# Copy dependencies list
COPY --chown=$MAMBA_USER:$MAMBA_USER enviroment.yaml /tmp/enviroment.yaml

USER root
# tools needed to build the Microsoft OBDC driver for Microsoft SQL server (requirements)
# https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server
RUN set -eux \
    && buildDeps=' \
            gnupg \
            curl \
            gcc \
        ' \
    && apt-get update \
    && apt-get install -y --no-install-recommends $buildDeps \
    && curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
    && curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list \
    && apt-get update \
    && ACCEPT_EULA=Y apt-get install -y --no-install-recommends msodbcsql17 \
    && apt-get install -y --no-install-recommends unixodbc-dev \
    # clean up
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get purge -y --auto-remove $buildDeps

# install app deps
USER $MAMBA_USER
RUN micromamba install --yes --name base --no-pyc --file /tmp/enviroment.yaml  && \
    micromamba clean --all --force-pkgs-dirs --yes
# Copy all app files
WORKDIR /app
COPY . .
CMD ["python", "app.py"]

Using pip:

ARG PYTHON_VERSION=3.9.16
ARG DEBIAN_VERSION=bullseye
ARG BASE_IMAGE=python:${PYTHON_VERSION}-slim-${DEBIAN_VERSION}
# Set virtual enviroment path
ARG VIRTUAL_ENV=/opt/venv

FROM ${BASE_IMAGE} as build
# Make the ARG variable defined outside a FROM statement available inside
# https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
ARG VIRTUAL_ENV
# tools needed to build psycopg from source:
# https://www.psycopg.org/docs/install.html#build-prerequisites
RUN apt-get update \
    && apt-get install -y --no-install-recommends gcc python3-dev libpq-dev apt-utils
# Copy dependencies list
COPY requirements.txt dependencies.txt
# Create virtual enviroment without bootstrapped pip
#!!! TODO: check if setuptools is bootstrapped, if yes then should be deleted to optimize image size
# https://docs.python.org/3/library/venv.html
RUN python -m venv --without-pip ${VIRTUAL_ENV}

# tools needed to build requirements from source:
# https://docs.scipy.org/doc//scipy-1.4.1/reference/building/linux.html
# https://numpy.org/doc/stable/user/building.html
RUN set -eux \
    && buildScietificPackagesDeps=' \
        build-essential \
        cmake \
        ninja-build \
        gfortran \
        pkg-config \
        python-dev \
        libopenblas-dev \
        liblapack-dev \
        autoconf \
        automake \
        libatlas-base-dev \
        # WIP: python-ply libffi-dev unixodbc-dev are needed to try building other pkgs from sources other than numpy and scipy
        python-ply \
        libffi-dev \
    unixodbc-dev \
    ' \
    && apt-get update \
    && apt-get install -y --no-install-recommends $buildScietificPackagesDeps \
    && pip install --upgrade --no-cache-dir pip wheel setuptools Cython meson-python pythran pybind11
# Use virtual enviroment (persistent in the final image):
ENV PATH=$VIRTUAL_ENV/bin:$PATH
ENV PYTHONHOME=
# Install dependencies list
# --prefix
#       used to install inside virtual enviroment path
# --no-cache-dir
#       used to avoid using cache for packages (decrease image size)
# --no-compile
#       used to avoid compiling python .py files to bytecode .pyc (decrease image size - bytecode is generated at first run when importing modules)
#       # TODO: check if runtime performance is affected
#       # TODO: check if import just the functions that are needed in src is a good practice
# --use-pep517 --check-build-dependencies --no-build-isolation
#       used to solve https://github.com/pypa/pip/issues/8559
#       "# DEPRECATION: psycopg2 is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed"
# --compile --global-option=build_ext --global-option=-g0 --global-option=-Wl
#       used to pass flags to C compiler and compile to bytecode from source, see:
#       https://towardsdatascience.com/how-to-shrink-numpy-scipy-pandas-and-matplotlib-for-your-data-product-4ec8d7e86ee4
#       https://blog.mapbox.com/aws-lambda-python-magic-e0f6a407ffc6
# 
# https://pip.pypa.io/en/stable/cli/pip_install/#options
RUN CFLAGS="-g0 -Wl,--strip-all" \
    pip install --prefix=${VIRTUAL_ENV} --no-cache-dir --ignore-installed \
        --requirement dependencies.txt \
        --use-pep517 --no-build-isolation --config-settings="build_ext=-j4" \
        --no-binary numpy,scipy \
    && pip cache purge

FROM ${BASE_IMAGE} as runtime
# runtime requirements of psycopg:
# https://www.psycopg.org/docs/install.html#runtime-requirements
RUN set -eux \
    && apt-get update \
    && apt-get install -y --no-install-recommends libpq5 libopenblas0 liblapack3 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# tools needed to build the Microsoft OBDC driver for Microsoft SQL server (requirements)
# https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server
RUN set -eux \
    && buildDeps=' \
            gnupg \
            curl \
            gcc \
        ' \
    && apt-get update \
    && apt-get install -y --no-install-recommends $buildDeps \
    && curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
    && curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list \
    && apt-get update \
    && ACCEPT_EULA=Y apt-get install -y --no-install-recommends msodbcsql17 \
    && apt-get install -y --no-install-recommends unixodbc-dev \
    # clean up
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get purge -y --auto-remove $buildDeps
ARG VIRTUAL_ENV
WORKDIR /app
COPY . .
COPY --from=build ${VIRTUAL_ENV} ${VIRTUAL_ENV}
# Use virtual enviroment (persistent in the final image):
ENV PATH=$VIRTUAL_ENV/bin:$PATH
ENV PYTHONHOME=
# Executable to be run
CMD["python", "app.py"]
jonashaag commented 1 year ago

It looks like you're missing some packages in the pip build, eg. psycopg2?

But in any case, it might very well be that Conda includes more optional dependencies because there's no optional dependencies in Conda, so most packages include optional dependencies.

LuchiLucs commented 1 year ago

No, psycopg2 is inside the requirements.txt file, its building and runtime deps are managed outside through apt-get.

If there is not way to install only required packages by means on conda/mamba/micromamba, my only solution is to stick with pip, is that right?

jonashaag commented 1 year ago

Yes

jonashaag commented 1 year ago

I wonder if it's worth it, your Dockerfiles are already really really complicated

LuchiLucs commented 1 year ago

I value the simplicity of conda/microconda a lot, that is why I wanted to give it a try, but the image size difference is huge, thanks anyway!

wolfv commented 1 year ago

Would be interesting to run something like dive: https://github.com/wagoodman/dive to see what teh large files are ...

LuchiLucs commented 1 year ago

@wolfv Thanks for the suggestion-this would be a good exercise. For my needs, however, it would be ideal to have a tool that allows you to build from scratch, inserting only what you need into the image and not the other way around, removing the extra. I think in the long run this process can lead to more inconsistencies and errors.