pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.29k stars 628 forks source link

Allow the publish goal for Docker images to push images in parallel #17613

Open Gaasmann opened 1 year ago

Gaasmann commented 1 year ago

I have a project creating several Docker images to be used by AWS Lambdas and we push those images to ECR. There is some lag each time we start pushing an image to ECR. Pants is currently pushing each image sequentially. Consequently, the publish goal takes much more time than it could take if it were pushing images in parallel.

Describe the solution you'd like I'd like Pants to be able to push Docker images (via the publish goal) in parallel rather than sequentially.

Describe alternatives you've considered No alternative (besides suffering) comes to mind. :-)

benjyw commented 1 year ago

Currently publish relies on InteractiveProcess, because publish side-effects (by changing state on the server being published to). InteractiveProcesses run in the foreground, and cannot run concurrently. So addressing this would require switching to a regular Process, and ensuring it's not cached. @stuhood do you see any problems with that?

(Also, order might actually matter here, e.g., not publishing a dist before all its requirements are published, but we don't actually ensure this today anyway, and that would require other coordination I think.)

Eric-Arellano commented 1 year ago

I wonder if it's somewhat backend dependent? Twine might be safe to parallelize, but backend X isn't? Maybe that suggests we allow our plugin API for publish either return a Process or InteractiveProcess?

stuhood commented 1 year ago

Yea. In some cases this will actually be interactive (in order for twine to sign wheels using gpg for example), so I don't think that this can universally be a Process.

ndellosa95 commented 1 year ago

Yea. In some cases this will actually be interactive (in order for twine to sign wheels using gpg for example), so I don't think that this can universally be a Process.

Many publishing tools (including twine and probably Docker?) provide an optional non-interactive mode. Could we add a flag like that to pants publish that will propagate non-interactive mode downstream so we can use a Process?

kaos commented 1 year ago

Yea. In some cases this will actually be interactive (in order for twine to sign wheels using gpg for example), so I don't think that this can universally be a Process.

Many publishing tools (including twine and probably Docker?) provide an optional non-interactive mode. Could we add a flag like that to pants publish that will propagate non-interactive mode downstream so we can use a Process?

That sounds feasible to me. If you know your tools and what your about to do, having a flag telling pants it may run them in the background would open up for parallelization.

ndellosa95 commented 11 months ago

Yea. In some cases this will actually be interactive (in order for twine to sign wheels using gpg for example), so I don't think that this can universally be a Process.

Many publishing tools (including twine and probably Docker?) provide an optional non-interactive mode. Could we add a flag like that to pants publish that will propagate non-interactive mode downstream so we can use a Process?

That sounds feasible to me. If you know your tools and what your about to do, having a flag telling pants it may run them in the background would open up for parallelization.

Would this require the creation of a MultiEffect class?

kaos commented 11 months ago

Yea. In some cases this will actually be interactive (in order for twine to sign wheels using gpg for example), so I don't think that this can universally be a Process.

Many publishing tools (including twine and probably Docker?) provide an optional non-interactive mode. Could we add a flag like that to pants publish that will propagate non-interactive mode downstream so we can use a Process?

That sounds feasible to me. If you know your tools and what your about to do, having a flag telling pants it may run them in the background would open up for parallelization.

Would this require the creation of a MultiEffect class?

I don't think so, as the whole idea with "running in the background" would be to do what you get with a regular Get, I think.

rhuanbarreto commented 2 months ago

@benjyw could you guide us through what it takes to make this update to use a Process? Can you point to the files in the repo?

ndellosa95 commented 2 months ago

@benjyw could you guide us through what it takes to make this update to use a Process? Can you point to the files in the repo?

I think what you should do is either add a flag or create another subclass to one of these classes to signify a non-interactive publish, then group all the non-interactive publishes together and run them in parallel using a MultiGet here.

benjyw commented 1 month ago

Yep, pretty much what @ndellosa95 said... Build up a list of ProcessRequest instances, and then MultiGet them