google / go-containerregistry

Go library and CLIs for working with container registries
Apache License 2.0
3.13k stars 542 forks source link

FR: crane upload #934

Open jonjohnsonjr opened 3 years ago

jonjohnsonjr commented 3 years ago

I'd like a new subcommand that uploads blobs.

I need to write up more about this, but crane hits a really sweet spot for most things where we're doing as little as possible to implement container-specific details, then getting out of the way. crane export and crane auth are great examples of this. So are crane {manifest,blob,ls,catalog,delete,digest,tag}.

There are also higher level things that are often useful, but I really like the lower-level things, because they make it possible to do things as a bash one-liner that otherwise require loading up an IDE and pulling in a ton of dependencies.

Uploading blobs is a huge missing piece.

strawman

$ crane upload reg.example.com/my-repo < some-file
sha256:db98fc6f11f08950985a203e07755c3262c680d00084f601e7304b768c83b3b1

The output is the digest of the uploaded thing.

output

Several modes of output would be useful:

A mode that prints the digest as a ref (maybe -q for "qualify?"), for use with crane blob or perhaps crane append:

$ crane upload -q reg.example.com/my-repo < some-file
reg.example.com/my-repo@sha256:db98fc6f11f08950985a203e07755c3262c680d00084f601e7304b768c83b3b1

# Should be a no-op (assuming this doesn't overwrite some-file before we read it):
$ crane blob $(crane upload -q reg.example.com/my-repo < some-file) > some-file

A mode that prints a descriptor, which should include the size:

$ crane upload reg.example.com/my-repo < some-file | jq .
{
  "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
  "size": 843,
  "digest": "sha256:db98fc6f11f08950985a203e07755c3262c680d00084f601e7304b768c83b3b1"
}

(Note the | jq . to pretty-print it -- this should be one line of output, normally.)

flags

There is plenty of opportunity for flags and bikeshedding of those flags.

For example, should compression be on or off by default? Given that it's possible to do this:

$ gzip some-file -c | crane upload reg.example.com/my-repo

I don't see a reason for it? Perhaps a -z flag as a convenience, but if you want to set a specific compression level, that's up to you.

unsolved

diffid

I'm not sure how to represent diffid here, for things that get compressed.

The problem is two-fold:

  1. We don't really have a way to persist metadata like this. If we want to be able to do something interesting with a blob, it would be nice if we didn't have to download the whole thing, check to see if it's gzipped, and hash it to compute the diffid.
  2. How do we actually expose this in a composable way?

Some ideas...

If we are printing a descriptor and we're compressing it, add the diffid as an annotation. We can have other tools understand this annotation when reconstructing a config file (like in pkg/v1/mutate).

$ crane upload -z reg.example.com/my-repo < some-file | jq .
{
  "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
  "size": 843,
  "digest": "sha256:db98fc6f11f08950985a203e07755c3262c680d00084f601e7304b768c83b3b1",
  "annotations": {
      "dev.ggcr.crane.diffid": "sha256:dbf2c0f42a39b60301f6d3936f7f8adb59bb97d31ec11cc4a049ce81155fef89"
   }
}

If we're not printing a descriptor, we could just output two lines, where the first is digest and the second diffid:

$ crane upload reg.example.com/my-repo -z < some-file
sha256:db98fc6f11f08950985a203e07755c3262c680d00084f601e7304b768c83b3b1
sha256:dbf2c0f42a39b60301f6d3936f7f8adb59bb97d31ec11cc4a049ce81155fef89

Or we could have an output file?

$ crane upload --diffid some-file-diffid.sha256 reg.example.com/my-repo -z < some-file
sha256:db98fc6f11f08950985a203e07755c3262c680d00084f601e7304b768c83b3b1

Or just tell people to manually run sha256sum before uploading? That seems reasonable enough to me...

media type

If we're printing a descriptor, I think it's safe to assume a default of the docker layer media types, compressed or not (we should probably sniff the contents to see if it is).

However, we might want to set an arbitrary media type, so that should perhaps just be a flag? Let's say I want to upload some json. The registry doesn't care here, actually, but if we want to compose this with another tool, it would be useful to have an accurate descriptor.

$ crane upload -m "application/json" reg.example.com/my-repo < config.json | jq .
{
  "mediaType": "application/json",
  "size": 809,
  "digest": "sha256:246c399c7f7b05e2a241ce0771456bee9eaa61d5015997237c920d69fd024443"
}
imjasonh commented 3 years ago

A mode that prints the digest as a ref (maybe -q for "qualify?"), for use with crane blob or perhaps crane append:

-q always means --quiet to me, maybe --full (no shorthand)?

I don't see a reason for [a zip flag]?

Neither do I, at least not right away. Documenting how to compress before uploading seems fine, and frankly more flexible for the user.

the first is digest and the second diffid:

I can tell you already I won't remember which is which without looking it up. 😄

Or just tell people to manually run sha256sum before uploading?

When in doubt, making users take extra steps (that we document) using standard tools feels better than adding in behavior we're not sure we'll need.

media type

Having a flag for this makes sense. It doesn't have to be included in the initial release, so long as we have a TODO/issue to track adding so we can remember.


This surface would add a lot of new workflows to crane, which is great, but it means we need good docs and examples (incl. e2e tests?) that demonstrate how it works, when you might want to use it, etc. Crane docs are a bit light now, mostly focused on generated CLI docs, and the top-level README.

jonjohnsonjr commented 3 years ago

but it means we need good docs and examples

I've started working on this and will try to send a PR soon-ish. My goal is to just have a collection of one-liners that demonstrate crane's flexibility, because that's the most effective form of documentation for me, personally.

jonjohnsonjr commented 3 years ago

maybe --full (no shorthand)?

Perhaps just --ref? Short and explicit.

When in doubt...

Cool, I think we're on the same page. I mostly want this for just uploading arbitrary bytes and not for appending stuff to an image, so I don't actually need the diffid for what I care about.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

tianon commented 1 year ago

I sent Jon "crane push-blob that gives me back a descriptor" and he sent me back this issue link (so here I am, dutifully commenting that such a thing is interesting to at least one other person besides Jon :joy:)

tianon commented 1 year ago

On the questions posed, I'd suggest probably start with no flags - if you want compression/diffid, you calculate/do them yourself beforehand, and if the command defaults to no compression it's "easy" to add more optional behavior later without affecting any existing users.