RFC: server-sent warnings

imjasonh commented 1 year ago

edit: the PR for this is now open! --> https://github.com/opencontainers/distribution-spec/pull/393

What

Prior art: Kubernetes added support for server-sent warning messages to be sent out-of-band of regular API responses in K8s v1.19, Sept 2020: https://kubernetes.io/blog/2020/09/03/warnings/

tl;dr: K8s webhook admission controllers can respond with a Warning: header that indicates some non-critical informational piece of data. Clients like kubectl print those warnings to stderr in addition to their regular output.

OCI registries could do the same thing: emit a Warning response header when the registry has some information to convey to the client without constituing a full-blown error.

What registries decide to use this side channel for is out of scope of OCI. OCI should not specify a standard set of warning messages for instance.

If enough of us happen to agree on the value and (importantly) semantics of this, I'd like to see if we can get this into OCI 1.1. 🤞

Proof of concept

As a quick proof of concept: https://github.com/google/go-containerregistry/compare/main...imjasonh:go-containerregistry:warn

This adds random warning messages when pulling manifests from a simple in-memory registry implementation, and shows those warnings when the crane client sees them.

$ crane digest localhost:1338/alpine
2023/03/13 13:55:37 [[WARNING]]: Your auth token will expire in 30 seconds.
sha256:ff6bdca1701f3a8a67e328815ff2346b0e4067d32ec36b7992c1fdc001dc8517

(The warning message is written to to stderr, the command output is undisturbed on stdout)

Nitty gritty details

The HTTP Warning header is technically deprecated. However, K8s decided to use it anyway, and I think we'd be better off following their example than inventing our own new warning header.

K8s only uses the warn-code 299, and warn-agent -, which side-steps many of the concerns about the Warning header that led to its deprecation AIUI. I think we should also specify that warn-date not be set, for simplicity.

An example Warning header:

Warning: 299 - "Your auth token will expire in 30 seconds."

K8s also limits the warning headers returned by the server. No more than 4KB of warnings are passed on, and they truncate warnings until they fit under the limit. I think we could have a similar effect in OCI-land, while giving registry implementations flexibility to decide which warnings they want to emit vs truncate.

Spec language

As a first crack at spec language:

registry implementations MAY respond with a Warning header (see https://tools.ietf.org/html/rfc7234#section-5.5)
- if set, the warn-code MUST be 299, and warn-agent MUST be -
- registry implementations MUST NOT emit more than 4KB of Warning headers
client implementations SHOULD surface any Warning headers encountered from the registry in an unobtrusive way (for example, stderr)

Good question. Following K8s' example, I believe kubectl aggregates warnings if a single CLI invocation resulted in many HTTP requests which each may have included multiple warnings. The main K8s client of note is kubectl, whose implementation was also controlled by that KEP. Being OCI, we don't have the luxury of specifying exact client behaviors like this.

I think we could add spec language to the effect of:

client implementations SHOULD attempt to deduplicate or aggregate warnings to avoid obtrusive output

...and leave it up to them to decide what's unobtrusive in their context.

Practically speaking, with my client-implementor hat on (🤠), I'd probably interpret this to mean deduplicating identical warnings received among N individual HTTP requests and only showing them once per image per CLI invocation.

So pulling registry.example.com/deprecated:v1 might perform a HEAD, maybe two manifest GETs, and N blob GETs, each of which might include the same deprecation warning, but I being a good and kind and spec-fearing client would deduplicate them all into one deprecation warning that I show the user. If a blob GET also warned that my creds were about to expire, I would log that warning separately.

(None of this is captured in the POC currently, but it could be relatively easily I think, if folks think this is worth POCing)

BenTheElder commented 1 year ago

Something like this would be an amazing option for moves where a registry is being phased out like: https://kubernetes.io/blog/2023/02/06/k8s-gcr-io-freeze-announcement/

(Granted, given more time I imagine we could also hack this with gradual error messages or something, but a non-breaking Warning would be nice)

The kubectl warnings have been extremely useful, though as you mentioned Kubernetes gets to cheat a little bit with widespread usage of a particular interactive client that we control.

cc @liggitt who implemented the kubectl warnings

liggitt commented 1 year ago

I'm +1 on the capability, and basically copying the safeguards about exceeding message header size from k8s

I'd also recommend giving pretty opinionated guidance about deduplicating, being brief, and limiting warnings to things that are actionable and relevant to the entity making the request (as best as the warner can tell)

From https://kubernetes.io/blog/2020/09/03/warnings/#admission-webhooks

If you are implementing a webhook that returns a warning message, here are some tips:

Don't include a "Warning:" prefix in the message (that is added by clients on output)

Use warning messages to describe problems the client making the API request should correct or be aware of

Be brief; limit warnings to 120 characters if possible

liggitt commented 1 year ago

One more side note: a key provision of the warnings RFC is that for warnings returned with code 299: "A system receiving this warning MUST NOT take any automated action." Kubernetes leans heavily on this to assume it is always safe to return a warning or stop returning a warning without backwards compatibility implications. You would think it goes without saying, but warning header contents are not APIs.

imjasonh commented 1 year ago

Something like this would be an amazing option for moves where a registry is being phased out like: https://kubernetes.io/blog/2023/02/06/k8s-gcr-io-freeze-announcement/

Would you believe that this exact use case was one we had in mind when thinking about proposing this? 😁

Thanks for the feedback and additional context, and for sharing K8s' experience. I'll ping you both on the eventual PR to add this to the spec itself, in case there's any specific wording or guidance you'd recommend.

opencontainers / distribution-spec