google / containerregistry

A set of Python libraries and tools for interacting with a Docker Registry.
https://gcr.io
Apache License 2.0
204 stars 116 forks source link

Allow pushing when layers are from a remote location (RBE) #95

Open ittaiz opened 6 years ago

ittaiz commented 6 years ago

Hi, I’m not a container expert so I might be way off but I’d really like to be able to use the pusher from a machine which doesn’t have the layers on it.

Context: I’m using GCB+RBE+Bazel (rules_docker) to build and test our code. We’re currently integrating publishing from rules_docker and its current assumption is that I’ve built locally and so the layers are available. I’d like to be able to run the pusher on different machines and so parallelize publishing as well as be able to start before the build for the entire repo finishes. ResultStore gives us the ability to poll a bazel build and know what’s ready as well as the URLs on RBE.

Is there a technical option of using the pusher with URLs from RBE?

cc @nlopezgi since he often has very wise insights :)

ittaiz commented 6 years ago

@nlopezgi are you by any chance relevant? if not do you know who might be?

nlopezgi commented 6 years ago

This sounds like a valid use case, but I don't know enough yet about how container registry works to know if its feasible/simple to do this. Maybe @mattmoor or @dlorenc can comment about this?

ittaiz commented 6 years ago

@mattmoor @dlorenc any thoughts?

ittaiz commented 6 years ago

@nlopezgi I think @mattmoor is working on other stuff. From the contributors view it seems @KaylaNguyen and @dekkagaijin are very active; Any chance you can contribute here or point me to the relevant person? Thanks!

mattmoor commented 6 years ago

Sorry, Github notifications now get lost in the noise of my day job (Knative), so I missed this.

Wearing my idealist hat IIUC what you are asking for is effectively distributed execution of the push, which is contrary to my mental model of Blaze's distributed execution, which must be hermetic and happens in a network jail.

Wearing my pragmatist hat I probably wouldn't try to make a single push straddle multiple machines (actions), that seems like it's asking for trouble. Instead what I'd probably do is leverage push incrementality to pre-push layers in a distributed fashion so that the ultimate push never needs to download them because existence checks succeed.

The basic way of doing this would be to wrap individual layers in a dummy image and leverage that to get the layer published to the registry. The problem is knowing where (and when it is appropriate to publish stuff, or every build?).

I never played with aspects, but it is possible that they might allow you to walk the action graph (when executing a push) and decorate builds with the kind of actions described above including where.

FWIW, we don't do anything special here internally. Basically, given build-to-build reproducibility and incremental uploads, this should only ever be a problem once per delta in the output. Granted even those deltas can be big.

ittaiz commented 6 years ago

Thank you for replying! I think I didn’t convey my intent. I want to be able to push a container from outside a bazel workspace and by leveraging the layers in the RBE CAS not via running them on RBE workers in the bazel build.

The context is that we run a bazel build with RBE. Which populated the CAS with all of the inputs the container push needs. I want to be able to download the container_push script (python binary?) and run it from an arbitrary production machine and have that script take its inputs from the CAS and not from local files.

Is that clearer? On Thu, 13 Sep 2018 at 16:59 Matt Moore notifications@github.com wrote:

Sorry, Github notifications now get lost in the noise of my day job (Knative), so I missed this.

Wearing my idealist hat IIUC what you are asking for is effectively distributed execution of the push, which is contrary to my mental model of Blaze's distributed execution, which must be hermetic and happens in a network jail.

Wearing my pragmatist hat I probably wouldn't try to make a single push straddle multiple machines (actions), that seems like it's asking for trouble. Instead what I'd probably do is leverage push incrementality to pre-push layers in a distributed fashion so that the ultimate push never needs to download them because existence checks succeed.

The basic way of doing this would be to wrap individual layers in a dummy image and leverage that to get the layer published to the registry. The problem is knowing where (and when it is appropriate to publish stuff, or every build?).

I never played with aspects, but it is possible that they might allow you to walk the action graph (when executing a push) and decorate builds with the kind of actions described above including where.

FWIW, we don't do anything special here internally. Basically, given build-to-build reproducibility and incremental uploads, this should only ever be a problem once per delta in the output. Granted even those deltas can be big.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/google/containerregistry/issues/95#issuecomment-421017484, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF9YlaysyjRNWBjLbDDui6XehIOapks5uamTXgaJpZM4WAWeH .

helenalt commented 6 years ago

@nlopezgi how do you recommend we proceed with this?

nlopezgi commented 6 years ago

It sounds like the work needed to get this use case supported is mostly related to making the script in this repo work with contents that are in the CAS. I don't have enough expertise wrt how container_push works or how the CAS works to be able to provide many meaningful insights (but I'll be happy to comment on any design someone produces for this feature). I think if this is a use case that Wix wants supported, and its one we want to prioritize, Wix would need to work with owners of container registry to figure out better what is the effort required to build this feature (i.e., produce a design?).

ittaiz commented 6 years ago

I'd be happy to get that ball rolling. Who are the owners of container registry?

On Thu, Sep 20, 2018 at 5:27 PM Nicolas Lopez notifications@github.com wrote:

It sounds like the work needed to get this use case supported is mostly related to making the script in this repo work with contents that are in the CAS. I don't have enough expertise wrt how container_push works or how the CAS works to be able to provide many meaningful insights (but I'll be happy to comment on any design someone produces for this feature). I think if this is a use case that Wix wants supported, and its one we want to prioritize, Wix would need to work with owners of container registry to figure out better what is the effort required to build this feature (i.e., produce a design?).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/google/containerregistry/issues/95#issuecomment-423203780, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF5hNuhS10ouLKsd1EMXPrNT2z65iks5uc6XXgaJpZM4WAWeH .

EricBurnett commented 6 years ago

Discussed this with Ittai today. Attempting to summarize my understanding (Ittai, correct anything I've messed up):

(All of these options have drawbacks; having written them out I'm not necessarily sure which is best for Ittai to pursue).

In any case, I consider the crux of this problem to be around what information is passed and where - questions on tooling should follow after figuring out what model seems most reasonable. (E.g. it'd be relatively straightforward for someone to write a tool that pulls layers from the RBE CAS directly, if they knew which digests to pull and what to do with them after.)

nlopezgi commented 6 years ago

About option iii: It should be feasible to provide a way to push with rules_docker using bazel build. wrt authentication, please see https://github.com/bazelbuild/rules_docker/issues/526 for an open thread to provide better support for this kind of use case (@ittaiz please comment specifically if the solution of having a rule to read secrets from env and output a file to override $HOME/.docker/config.json would work for your use case). Please let me know if option iii is what you think will work best so I can plan accordingly to work on the features (but, iiuc, it should not be hard to implement exposing the push script to execute with bazel build via an additional output of the push rule, so if anyone wants to volunteer a PR to rules_docker to do this it would be great)!