opencontainers / distribution-spec

OCI Distribution Specification
https://opencontainers.org
Apache License 2.0
828 stars 205 forks source link

Streamed Blob Upload not defined by spec #303

Open tmkontra opened 3 years ago

tmkontra commented 3 years ago
  1. The spec declares that there are two modes of blob upload: monolithic and chunked. [0]
  2. It goes on to describe that a chunked upload is done via PATCH. It states that the PATCH will include the Content-Range header.
  3. However, the conformance tests seem to suggest a third type of blob upload: streamed [1]
    • This test case issues a single PATCH without Content-Range, followed by a closing PUT

It is unclear what this conformance test is for. I see the existence of a "streamed" upload specification previously[2], but I am not sure when it was removed. If "streamed" upload is indeed part of v1.0.0, the spec is not clear how it should be conducted.

Furthermore, whether or not "streamed" upload is part of the spec: I think the "chunked" section is ambiguous about whether the three request headers MUST be included (see quoted text below). It puts some requirements on the contents of these headers, but not their presence (at least not explicitly, anyway).

To upload a chunk, issue a PATCH request to a URL path in the following format, and with the following headers and body:

URL path: <location>

Content-Type: application/octet-stream
Content-Range: <range>
Content-Length: <length>

I apologize that I am not in a position to produce a PR to disambiguate this situation, but I am hoping someone can clear up my confusion, and I'd be happy to contribute to improving the spec (or conformance suite) if possible.

[0] https://github.com/opencontainers/distribution-spec/blob/main/spec.md?plain=1#L197 [1] https://github.com/opencontainers/distribution-spec/blob/main/conformance/02_push_test.go#L21 [2] https://github.com/opencontainers/distribution-spec/commit/92e1994a4f13cff06c03f11d86b40ade4ed92730#

jdolitsky commented 2 years ago

Hello, you're right. I think this is actually just a chunked upload, which is tested later on..

I summon @jonjohnsonjr - this test OK to remove?

jonjohnsonjr commented 2 years ago

This seems ubiquitous enough that it should probably be documented?

mpreu commented 1 year ago

Furthermore, whether or not "streamed" upload is part of the spec: I think the "chunked" section is ambiguous about whether the three request headers MUST be included (see quoted text below). It puts some requirements on the contents of these headers, but not their presence (at least not explicitly, anyway).

I stumbled over this today. Just reading the spec for the PATCH request, I expected the headers to be required. But I noticed, that common tooling like podman also just sends Transfer-Encoding: chunked requests (at least in some cases) or requests with Content-Length but without Content-Range. At least for me the wording in the spec could be more precise, especially as there are tests allowing this behavior.

haampie commented 1 year ago

I don't understand why the OCI spec created its own "chunked" upload, using non-standard (at least to the HTTP RFCs) Content-Range values and still requiring ordered upload. What's the point then, given that Transfer-Encoding: chunked exists... I wonder if this can be simplified in a future version of the specification.

In practice, barely anybody uses chunked uploads? I see mostly POST + monolithic PATCH + PUT. Doing multiple PATCH requests is slower and less robust. Also more work to implement: e.g. python's builtin urllib abstracts Transfer-Encoding: chunked away -- it happens transparently on a request with data, whereas this custom OCI chunked upload has to be implemented by hand (typically at the cost of performance, since urllib doesn't keep the connection alive between requests).

@mpreu do you know why Transfer-Encoding: chunked worked? Maybe it's not implemented by the registry explicitly, the registry just runs read() and some http lib in the background assembles the chunks transparently?