Open salrashid123 opened 1 year ago
Hi, we've investigated this - SOURCE_DATE_EPOCH is a promising direction, and we tried approaches with resetting mtime for everything. Unfortunately, pip install
is fundamentally irreproducible, because it generates pyc files that include the timestamp. Unzipping wheels without using pip might make this possible, or I think there's some PEPs in the works that might help with this. See https://github.com/pypa/pip/issues/5648
got it; i think esp with python it'd be difficult to do with its own toolchains.
maybe generating the docker file per https://github.com/replicate/cog/issues/1241#issuecomment-1660128528
and then chaining it to off the shelf kaniko
would be sufficient workaround ( ref
docker run \
-v `pwd`:/workspace -v $HOME/.docker/config_docker.json:/kaniko/.docker/config.json:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
gcr.io/kaniko-project/executor@sha256:034f15e6fe235490e64a4173d02d0a41f61382450c314fffed9b8ca96dff66b2 \
--dockerfile=Dockerfile \
--reproducible \
--destination "docker.io/salrashid123/tpmds:server" --context dir:///workspace/
i realize now we're involving kaniko as well but it maybe easier to delegate it like this for now
Would that address the pyc timestamps?
i think so, as part of the kaniko reproducible builds, it sets up snapshots resetting the all file times.
tried it from the getting started guide and using the generated Dockerfile
seems to always reference a file like
COPY .cog/tmp/build1866459875/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
which doesn't exist and causes the kanilo to fail
cog build
cog debug > Dockerfile
docker run -v `pwd`:/workspace -v $HOME/.docker/config_docker.json:/kaniko/.docker/config.json:ro -v /var/run/docker.sock:/var/run/docker.sock gcr.io/kaniko-project/executor@sha256:034f15e6fe235490e64a4173d02d0a41f61382450c314fffed9b8ca96dff66b2 --dockerfile=Dockerfile --reproducible --destination "docker.io/salrashid123/cogdemo:server" --context dir:///workspace/
oh, so python embeds the timestamp inside the file...then kaniko isn't gonna help out.
...and i can't sincerely recommend going all out on and investing in python-bazel builds
as a stopgap for the debug issue, run cog build once and then interrupt, it will place a cog wheel in .cog/tmp/whatever, and then you can edit the cog debug
output
does python-bazel address pyc timestamps somehow? does it just strip pyc files?
it would be incredibly helpful for us to get reproducible builds for deduplication
yeah, i tried the interrupt trick suggested but each cog+kaniko build is different hash (which is expected, i tihnk)
i'm unsure exactly how bazel rules_python
handles pyc files but i can say you need to precisely define everything upfront and bazel uses its own sandbox to canonicalize everything.
some examples with rules_python which may help answer the question though....once it works with rules_python, stitching it with rules_docker and containers would be easy
https://github.com/bazelbuild/rules_python/tree/main/examples
Y'all might also investigate Nix (which provides dockerTools
, an alternate build tool for Docker images) towards this end.
Nix converts all timestamps to one second past epoch, btw.
Does rules_python generate pyc at all? https://github.com/bazelbuild/rules_python/issues/1761
Again, there's no issue with mtimes, the problem is the timestamps embedded in pyc files
Does rules_python generate pyc at all? bazelbuild/rules_python#1761
Again, there's no issue with mtimes, the problem is the timestamps embedded in pyc files
The NixOS install CD is fully binary reproducible. I can't imagine it not including Python, so clearly they've got that licked somehow.
Indeed, quoting:
# Determinism: The interpreter is patched to write null timestamps when compiling Python files # so Python doesn't try to update the bytecode when seeing frozen timestamps in Nix's store. export DETERMINISTIC_BUILD=1;
then we would have to ship nix's patched interpreter, right? DETERMINISTIC_BUILD is not present in stock python
her'es an end-to-end covering building an image with bazel and serving with cog.
if precise build steps are followed, you should end up with
sha256:3db6542dc746aeabaa39d902570430e1d50c416e7fc20b875c10578aa5e62875
(i verified it on two different clean vms)
as mentioned, using bazel is really tedious though toolchains like gazelle may help with python. (imo as-is in current state, the developer friction all this introduces negates the primary ease-of-use benefits of using/building w/ cog in the first place)
[tbh, i've never used or needed cog and try to not use bazel for deterministic builds (in go there are easier ways)...this issue with cog was something i noticed and then ratholed academically.]
I would like to add my +1 for supporting reproducible builds via Nix and NixOS as well.
https://github.com/datakami/cognix is a project that exists and kind of works but unfortunately isn't a priority for us at this time
Cog currently uses docker to build the images
however, docker based builds are not reproducible: you'll get different image hashes even with the identical config
this long-term feature request is to refactor the build system from docker to something like
buildah
bazel
some references building using kaniko and bazel