opencontainers / artifacts

OCI Artifacts
https://opencontainers.org
Apache License 2.0
224 stars 54 forks source link

Expose blobs on URLs with proper mime type for browser consumption #34

Closed awakecoding closed 1 year ago

awakecoding commented 3 years ago

I want to use OCI Artifacts to store video files (remote desktop session recordings) efficiently, and one thing I would need is a way to play the videos files directly in a browser, without downloading them locally first. Since we can already declare a proper mime type for the artifacts (mediaType field), all that would be required is a blob URL that responds with the proper mime type, allowing the browser to recognize the contents and play it directly as it is being downloaded.

Using the proper mime type should make video files playable in the browser either by opening the URL directly, or by embedding the blob URL inside the HTML5 video tag.

My primary use case here is video files, but the same applies for audio files and images files, or anything that can be opened directly in a browser given the correct mime type.

SteveLasker commented 3 years ago

riffing a bit more: One of the values of an OCI distribution based service is the named/content addressable storage. The blobs are details, but not the “human” interaction model registry.io/namespace/artifact:tag

What would be interesting is how these names elements can be combined.

If a curl client want to pull binary, can it pass registry.io/namespace/binary:version

If the notation client is invoked, as a gate to a curl, it uses the same names reference: notation verify registry.io/namespace/binary:version

If a remote desktop video client wishes to get metadata, it can also use the same named reference.

The registry would serve a redirect to the content (blob) based on the type of request.

This model could preserve the common named reference, pointing at a manifest to describe the artifact. But also support the other scenarios like signatures, sboms and metadata.

Thoughts?

awakecoding commented 3 years ago

The tag human notation is not flexible enough for this, it's probably better if we simply parse the manifests to find the blob URLs, and then construct associative "file URLs" that use the content address (digest) combined with a "human address" (file name + type) to pull the blob as a regular file that will be recognized by the browser.

The spec uses the following URL structure to pull blobs: /v2/<name>/blobs/<digest>

Here is what I suggest: /v2/<name>/files/<digest>/<filename>

So let's say you have a manifest that refers to a PDF presentation for which you now have the digest. You could pull the blob and open it in a PDF viewer, but if you try reading the blob directly in a browser, it will likely just download it instead of launching the built-in PDF viewer of the browser, and that's because it didn't have the correct mime type (application/pdf). Let's fix this with my suggestion:

/v2/<name>/files/<digest>/presentation.pdf

The OCI registry would use the last element of the URL as the file name, but serve the contents of the corresponding blob. With automatic mime types, ".pdf" can be served with "application/pdf" as the mime type, which should make it work inside the built-in PDF viewer of most browsers.

The last improvement to discuss is how to explicitly specify a different mime type instead of leaving it to default mime type detection based on the file name. We could add a query parameter, or a request header for this.

What do you think?

SteveLasker commented 3 years ago

It’s getting closer.

We do have this general request for curling urls from a few folks, including some internal Azure teams. I’m trying to find a way to meet the url requirements while staying aligned to some principles around the distribution spec. Some of them aren’t written in the specs but are standard implementation details.

The premise of how far the distribution spec diverges from these core concepts has been the source of much discussion, so I do want to recognize these challenges and try to keep expanding the capabilities to continue on the vision that distribution can be the base for most new package managers, while maintaining some core principals.

Blob URLs are neither fixed over time or tied to the same domain or URL as the artifact reference

A distribution instance has two endpoints:

A user references: wabbit-networks.io/net-monitor:v1 The blob content (layers for container images), may be served from 1234567.blobs.core.cloud.io

See an example of ACR Dedicated Data endpoints

Distribution clients know how to negotiate this series of requests. A standard and simplified “happy path” would be:

  1. A client requests an artifact (by tag or digest)
  2. The registry responds with a manifest
  3. The client evaluates the blobs defined in the manifest. The blobs are wrapped in descriptors.
  4. Based on the digests (in the descriptors), the client evaluates if it has any, already on the client.
  5. The client identifies the missing blobs and sends a list of requests to the server for urls for each blob.
  6. Based on various factors, different blob urls are returned. Two requests for the same manifest, or different manifests, even in the same repo, may return different blob urls.

Reasons for differences:

What your asking for is something I'm hoping we can solve. I’m just searching for a solution that gives the benefits above for using the same url to get supporting artifact types (signature, sboms, scan results) and stay true to the core capabilities of the distribution spec that has scaled to many scenarios.

What you, and others, are asking for is a way for the registry to redirect a request, rather than the client having to negotiate the manifest content.

Perhaps @sajayantony, @stevvooe or @jonjohnsonjr might have some ideas on how we can redirect requests, based on the mediaType in the header.

mikebrow commented 1 year ago

We are archiving this repo and this issue will thus become read only would you like to move it to the distribution spec repo?

mikebrow commented 1 year ago

closing for now due to pending archive action.. pls reopen if archive is not completed and/or if you believe this close to be in error