Open imjasonh opened 5 years ago
I had two ideas for this. One is simple, the other requires a few fixes.
We create a registry client package that maps directly to the distribution spec. These would be lower-level operations, not like our higher level remote
package. The remote
package would use this registry client instead of making http requests directly.
People who need the functionality that remote.CheckPushPermission
provides could just call InitiateUpload
directly.
TL;DR: we need to do everything lazily instead of eagerly. stream.Layer
is a start down this path.
There are a lot of things we need to fix, that could be broken out into separate issues, but lumping them all in here will make it easier to understand the problem end-to-end, I think.
What needs fixing, in order:
stream.Layer
's Size
, Digest
, and DiffID
methods should block until the stream has been consumed instead of returning stream.ErrNotComputed
. stream.NewLayer
should take the lock and compressedReader.Close
should release it. This means that you'll basically always need to use a goroutine to use stream.Layer
effectively (kind of like io.Pipe), which we should document (or at least provide examples).tarball.LayerFromOpener
should use stream.Layer
as most of its implementation. Right now, we compute the diffid and digest as soon as we read it, which we should be doing lazily. (For already-compressed layers, we will have to do something else -- computing the DiffID is a pain for those.)mutate.image.Layers
should first look through its Addendum
s to see if there are any streaming layers instead of relying on compute()
to return stream.ErrNotComputed
.remote.Write
should always upload the config file in a separate goroutine concurrently with the layers. Because we made stream.Layer
's Size
, Digest
, and DiffID
block instead of returning an error, the config upload will sit patiently in a goroutine waiting on any stream.Layer
s to complete their uploads before continuing. In the case where the image doesn't have any stream.Layer
s, it will just upload immediately. This simplifies the remote.Write
code a lot.Let's look at a code snippet from kaniko to see what's wrong and how this would be better: https://github.com/GoogleContainerTools/kaniko/blob/246cc92a33c29d76b811f7da605375d8bc4c2198/pkg/executor/push.go#L110-L144
We open up a layer using tarball.LayerFromFile
. Currently, this computes the Digest, Size, and DiffID immediately.
After that, we initate a push, which can fail if we don't have valid credentials.
When we open up the layer using tarball.LayerFromFile
, it will use a stream.Layer
instead, so the Digest/Size/DiffID computations will be deferred until the layer is consumed.
We'll immediately start the image upload, which will fail when we try to initiate a blob upload, which happens before we even try to open the layer's io.ReadCloser
.
Thus, we never even try to open the file.
Most of the mutate
operations are unfortunately synchronous. E.g. they update the image immediately and don't wait lazily for it to be consumed. It's possible that we could introduce a conceptual "thunk" in here such that all these operations are deferred until the corresponding method is actually called again, i.e. we need to make all of those work with stream.Layer
, though it may be impossible.
I'm not sure if we can do this without breaking clients, but I think it's worth doing even if it's hard.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.
This method employs a somewhat silly hack to determine reliably whether a keychain can be used to authorize pushes to a repo. This can be useful if you want to check whether you can push an image before you spend lots of effort constructing the layers for that image.
@jonjohnsonjr has some ideas about how we can make this unnecessary by allowing users to call
remote.Write
before all the layer contents are available andmutate.Append
ed in, by passingstream.Layer
s to it that are populated in goroutines and closed when they're done. This relies on a change to howremote.Write
,mutate.Append
andstream.Layer
fit together.I'll let him elaborate.