Open declension opened 6 months ago
The __pycache__
files are python source code compiled to a bytecode. If you remove those files python will be re-generating them every time it starts. You're essentially trading the startup speed for the size of the docker image. I think actually removing the .py
files and leaving bytecode could make more sense :) although then debugging might be hard.
As for removing I don't have answer as I did not remove this myself, but perhaps you could use postFixup
phase and inject rm
command to remove them?
@declension did you make any headway on this? I'm seeing a ~100mb difference in size due to caches
@jDmacD not really :disappointed:
At one point I doubted myself that it was even happening, but pretty sure it is.
I then tried hacking (badly) various combinations of extraCommands
/ fakeRootCommands
etc (from https://ryantm.github.io/nixpkgs/builders/images/dockertools/#ssec-pkgs-dockerTools-buildLayeredImage) but couldn't even see the cache files at that point, can't remember what my theory was, but it was guesswork anyway.
On re-testing I'm now seeing 83MB of __pycache__
in the image in question, which is an API project with medium-sized set of dependencies, i.e. nothing massive
It's definitely happening This Dockerfile produces a 161 mb image
# syntax=docker/dockerfile:latest
# https://medium.com/@albertazzir/blazing-fast-python-docker-builds-with-poetry-a78a66f5aed0
FROM python:3.11-buster as builder
RUN pip install poetry==1.8.3
ENV POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache
WORKDIR /app
COPY pyproject.toml poetry.lock ./
RUN touch README.md
RUN --mount=type=cache,target=$POETRY_CACHE_DIR poetry install --without dev --no-root
FROM python:3.11-slim-buster as runtime
ENV VIRTUAL_ENV=/app/.venv \
PATH="/app/.venv/bin:$PATH" \
PYTHONPATH=.
COPY --from=builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}
COPY bgp_operator ./bgp_operator
ENTRYPOINT ["kopf", "run" , "--liveness=http://0.0.0.0:8080/healthz", "-A", "-m", "bgp_operator.main"]
This flake produces a 245 mb image
{
description = "Application packaged using poetry2nix";
inputs = {
flake-utils.url = "github:numtide/flake-utils";
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable-small";
poetry2nix = {
url = "github:nix-community/poetry2nix";
inputs.nixpkgs.follows = "nixpkgs";
};
};
outputs = { self, nixpkgs, flake-utils, poetry2nix }:
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = nixpkgs.legacyPackages.${system};
inherit (poetry2nix.lib.mkPoetry2Nix { inherit pkgs; }) mkPoetryApplication cleanPythonSources;
in
{
packages = {
bgpOperator = mkPoetryApplication {
projectDir = self;
checkGroups = [];
};
# nix build .#packages.x86_64-linux.dockerImage
# docker load < result
dockerImage = pkgs.dockerTools.buildImage {
name = /bgp-operator-nix";
tag = "latest";
copyToRoot = pkgs.buildEnv {
name = "bgp-operator-env";
paths = [ self.packages.${system}.bgpOperator ];
};
runAsRoot = ''
mkdir -p /etc/
echo "root:x:0:0:root:/root:/bin/bash" > /etc/passwd
'';
config.Cmd = [ "/bin/bgp-operator" ];
config.WorkingDir = "/";
};
default = self.packages.${system}.bgpOperator;
};
devShells.default = pkgs.mkShell {
packages = [ pkgs.poetry pkgs.dive ];
shellHook = ''
export NSX_USERNAME=$(${pkgs.gum}/bin/gum input --placeholder "NSXT username")
export NSX_PASSWORD=$(${pkgs.gum}/bin/gum input --placeholder "NSXT password" --password)
'';
};
});
}
digging around in the two using dive
Permission UID:GID Size Filetree
dr-xr-xr-x 0:0 1.9 MB │ ├─⊕ y6hmqbmbwq0rmx1fzix5c5jszla2pzmp-tzdata-2024a
dr-xr-xr-x 0:0 41 MB │ ├── y7y3yvzlk2001hgqlzqxhz8aszxffdrx-python3.11-kubernetes-29.0.0
dr-xr-xr-x 0:0 41 MB │ │ ├── lib
dr-xr-xr-x 0:0 41 MB │ │ │ └── python3.11
dr-xr-xr-x 0:0 41 MB │ │ │ └── site-packages
dr-xr-xr-x 0:0 41 MB │ │ │ ├── kubernetes
-r--r--r-- 0:0 844 B │ │ │ │ ├── __init__.py
dr-xr-xr-x 0:0 1.2 kB │ │ │ │ ├─⊕ __pycache__
dr-xr-xr-x 0:0 40 MB │ │ │ │ ├── client
-r--r--r-- 0:0 52 kB │ │ │ │ │ ├── __init__.py
dr-xr-xr-x 0:0 266 kB │ │ │ │ │ ├─⊕ __pycache__
dr-xr-xr-x 0:0 24 MB │ │ │ │ │ ├── api
-r--r--r-- 0:0 4.2 kB │ │ │ │ │ │ ├── __init__.py
dr-xr-xr-x 0:0 15 MB │ │ │ │ │ │ ├─⊕ __pycache__
-r--r--r-- 0:0 5.2 kB │ │ │ │ │ │ ├── admissionregistration_api.py
-r--r--r-- 0:0 182 kB │ │ │ │ │ │ ├── admissionregistration_v1_api.py
-r--r--r-- 0:0 210 kB │ │ │ │ │ │ ├── admissionregistration_v1alpha1_api.py
-r--r--r-- 0:0 210 kB │ │ │ │ │ │ ├── admissionregistration_v1beta1_api.py
Permission UID:GID Size Filetree
drwxr-xr-x 0:0 22 kB │ │ ├─⊕ google_auth-2.29.0.dist-info
drwxr-xr-x 0:0 304 kB │ │ ├─⊕ idna
drwxr-xr-x 0:0 12 kB │ │ ├─⊕ idna-3.7.dist-info
drwxr-xr-x 0:0 16 kB │ │ ├─⊕ iso8601
drwxr-xr-x 0:0 5.5 kB │ │ ├─⊕ iso8601-2.1.0.dist-info
drwxr-xr-x 0:0 644 kB │ │ ├─⊕ kopf
drwxr-xr-x 0:0 19 kB │ │ ├─⊕ kopf-1.37.2.dist-info
drwxr-xr-x 0:0 13 MB │ │ ├── kubernetes
-rw-r--r-- 0:0 844 B │ │ │ ├── __init__.py
drwxr-xr-x 0:0 13 MB │ │ │ ├── client
-rw-r--r-- 0:0 52 kB │ │ │ │ ├── __init__.py
drwxr-xr-x 0:0 8.6 MB │ │ │ │ ├── api
-rw-r--r-- 0:0 4.2 kB │ │ │ │ │ ├── __init__.py
-rw-r--r-- 0:0 5.2 kB │ │ │ │ │ ├── admissionregistration_api.py
-rw-r--r-- 0:0 182 kB │ │ │ │ │ ├── admissionregistration_v1_api.py
-rw-r--r-- 0:0 210 kB │ │ │ │ │ ├── admissionregistration_v1alpha1_api.py
-rw-r--r-- 0:0 210 kB │ │ │ │ │ ├── admissionregistration_v1beta1_api.py
-rw-r--r-- 0:0 5.2 kB │ │ │ │ │ ├── apiextensions_api.py
-rw-r--r-- 0:0 121 kB │ │ │ │ │ ├── apiextensions_v1_api.py
-rw-r--r-- 0:0 5.2 kB │ │ │ │ │ ├── apiregistration_api.py
-rw-r--r-- 0:0 118 kB │ │ │ │ │ ├── apiregistration_v1_api.py
-rw-r--r-- 0:0 5.2 kB │ │ │ │ │ ├── apis_api.py
I suspect this is doing the damage, but I don't know how to avoid it
copyToRoot = pkgs.buildEnv {
name = "bgp-operator-env";
paths = [ self.packages.${system}.bgpOperator ];
};
I came here trying to understand the same problem (I'm very new to nix
).
I managed to remove __pycache__
from a single dependency with:
pkgOverrides = pkgs.poetry2nix.overrides.withDefaults (
final: prev: {
somedependency = prev.somedependency.overridePythonAttrs (old: {
postFixup = ''
for pycache in $(find $out -name __pycache__) ; do
rm -fr ''${pycache}
done
'';
});
}
);
myEnv =
(pkgs.poetry2nix.mkPoetryEnv {
overrides = pkgOverrides;
projectDir = ./.;
...
});
I still have to figure out how to apply this to all dependencies without explicitly listing them. The joys of learning a new language :)
Ultimately, it looks like removing the cached files is a tradeoff between startup time of your container and size, as was mentioned above.
Three small bits of info I learnt while going down this rabbit hole:
mkPythonDerivation
explicitly recompiles all files to bytecode for reproducibility. It should be possible to override that hook to prevent the creation of this bytecode (rather than create to then delete the files)..pyc
files, respectively .pyc
, .opt-1.pyc
and .opt-2.pyc
which correspond to the "normal", -O
and -OO
flags of python. When those differ, they are both present in the output. Otherwise they are hardlinks to the same file. It should be possible to keep only the ones that match your usage of python at runtime..pyc
files only, which means it might be possible to build production docker images that do not include the source code, if one wants to shrink as much as possible.
First of all, thanks for a great project! Really helped us and got me further into the Nix rabbit-hole (a long way to go still).
Describe the issue
I know this might not belong to
poetry2nix
, but thought this might still be a good place to talk about addressing it...We're Dockerising various Poetry apps using poetr2ynix and
dockerTools.streamLayeredImage
. Debugging why the image increased in size so much from the [minimalist, multi-stage]Dockerfile
version, it seems that Python packages in the Nix store come with__pycache__
and*.pyc
files, which adds a lot of weight never present in the traditional images' layers (in fact people often addRUN
steps to remove any straggling such files).Is there a best practice here that I'm missing? Can poetry2nix clean these files out, at least in the locally built package(s)?