Open ndeloof opened 2 years ago
Looks super interesting. Please open a PR for the proposal.
This is an interesting one. How would you handle multiple files?
restore a data cache
Hi @ndeloof ,
Could you pls elaborate more on your use case?
Actually, ORAS CLI v0.15 will provide oras manifest fetch
and oras blob fetch
that might meet your need. You can check out this doc for details.
Seems like the output of this should be a tgz so multiple files can be supported
One UX can be: oras pull localhost:5000/json-artifact:v1 --output - | jq
and oras
returns error if there are multiple blobs associated with the target manifest.
Are there any plans to implement this also for the input so that something like the following will be possible:
command-a | command-b | oras push localhost:5000/json-artifact:v1 -
...
oras pull localhost:5000/json-artifact:v1 --output - | jq
@ProbstDJakob There is a plan to provide piped-command user experience in v1.2.0 by standardizing output, see https://github.com/oras-project/oras/issues/638
Still I have questions on below commands:
command-a | command-b | oras push localhost:5000/json-artifact:v1 -
Since the layer content comes from stdin not file,
1) What is the file name of the generated layer?
2) How should we name the layer if user runs oras pull localhost:5000/json-artifact:v1
?
oras pull localhost:5000/json-artifact:v1 --output - | jq
What if localhost:5000/json-artifact:v1
contains multiple layers?
oras push
could receive an additional option as follows:
--from-stdin[=file-path[:type]]
oras will read data from the stdin and write it to `file-path` within the image. If `file-path` has not
been supplied it defaults to `./stdin.blob` with the type `application/octet-stream`. This option can be
used in conjunction with other files supplied via `<file>[:type] [...]` but does not need to. The only
exception is that there must not be supplied another `-` file.
...
<file>[:type] [...]
The files to include within the image. The special file `-[:type]` is equivalent to using the option
`--form-stdin=./stdin.blob[:type]` where if no type has been supplied the type
`application/octet-stream` will be used.
Regarding your second question, I am not that familiar with how OCI images work, thus I am currently unable to answer your question, but I am willing to study the docs in order to further elaborate your question if the answer above doesn't solve it implicitly.
For oras pull
there might be a similar option:
--to-stdout[:<single|tar>][=file-path,...]
Instead of writing the content of the image to a directory, the content will be written to stdout.
When supplying `--to-stdout:single[=file-path]` the file found at `file-path` within the image will be
written to stdout without converting it to an archive. If no `file-path` has been supplied and the image
contains exactly one file this will be written out, otherwise the command will fail. If more than one
`file-path` has been supplied the command will also fail.
When supplying `--to-stdout:tar[=file-path,...]` the files found at `file-path,...` will be written to
standard out by combining the files within an uncompressed tar archive. If no files have been supplied,
all files within the image will be included in the archive.
Aliases:
`--to-stdout=<file-path>` => `--to-stdout:single=<file-path>`
`--to-stdout=<file-path,...>` => `--to-stdout:tar=<file-path,...>`
`--to-stdout` => `--to-stdout:single`
This option is mutually exclusive with the `--output` option.
Regarding the penultimate line, I am not quite confident if this is the right choice (defaulting to single
), but I think most users would try to pipe a single file instead of a whole archive.
Just for the records, I found this solution for me to stream the content of an artifact to stdout:
oras blob fetch -o- ghcr.io/foo/test@$(oras manifest fetch ghcr.io/foo/test:0.0.1 | yq '.layers[0].digest')
I pushed the tgz like this:
oras push ghcr.io/foo/test:0.0.1 --artifact-type application/vnd.foo.machine-image.v1 image.tgz
This solves my use case, but it would be great to do that without yq
(in a single oras
call).
I pushed the tgz like this:
oras push ghcr.io/foo/test:0.0.1 --artifact-type application/vnd.foo.machine-image.v1 image.tgz
@guettli This is very interesting. May I know what's stored inside the image.tgz
and how it is generated?
If you provided a folder but not a file, oras push
can help pack and oras pull
can unpack automatically. If your end-to-end scenario fits into this, you may try
oras push ghcr.io/foo/test:0.0.1 --artifact-type application/vnd.foo.machine-image.v1 image # pack and push all files in folder image
oras pull ghcr.io/foo/test:0.0.1 -o pulled # pull and unpack files into folder pulled/image
@qweeah thank you for asking. The tgz contains a linux root file system. We booted Ubuntu on a VM, then we installed some tools and applied some configuration, and then we create a tgz, so that we have constant custom image. The image is about 1.8 GByte and contains 100k files.
I am happy to store the tgz as blob in an artifact. Nice to know that you could use oras for tar/untar, too. But at the moment I don't see big the benefit.
One drawback of the current method: We can't create the artifact via streaming. AFAIK something like this is not supported yet:
tar -czf- .... | oras push ...
@qweeah what benefit would we have if we would use oras instead of tar/untar?
Before upload any blob to a registry, the digest must be specified.
Unless you can get the digest before archiving is done, Otherwise it's not possible to do the streamed uploading.
@qweeah what benefit would we have if we would use oras instead of tar/untar?
Well, rather than using oras manifest fetch
+ oras blob fetch
, you can use only one command oras pull
to do the pulling.
Before upload any blob to a registry, the digest must be specified.
Unless you can get the digest before archiving is done, Otherwise it's not possible to do the streamed uploading.
In order to circumvent this oras could buffer the input stream in memory until for example 64MiB and if this threshold has been reached oras pauses reading new input and first writes the 64MiB into a temporary file with narrow access rights and then resumes reading from the input and directly pipe it into the file. After reaching the EOF oras could calculate the digest either from the in memory buffer or if the content was too large from the file, pack it, and upload the image.
The buffering in memory would only be for performance (and security) reasons, but would mostly be a nice to have feature.
Before upload any blob to a registry, the digest must be specified. Unless you can get the digest before archiving is done, Otherwise it's not possible to do the streamed uploading.
In order to circumvent this oras could buffer the input stream in memory until for example 64MiB and if this threshold has been reached oras pauses reading new input and first writes the 64MiB into a temporary file with narrow access rights and then resumes reading from the input and directly pipe it into the file. After reaching the EOF oras could calculate the digest either from the in memory buffer or if the content was too large from the file, pack it, and upload the image.
The buffering in memory would only be for performance (and security) reasons, but would mostly be a nice to have feature.
It's not sth oras can circumvent, you cannot get the checksum of the blob before tar
finishes writing
I know that is why I proposed the solution with buffering/writing to a temporary file. Thus the calculation of the digest can be done after tar
finishes without the need of creating/deleting a temporary file by oneself and therefore support streaming.
Yes, the digest calculation can be done while packing and this optimization has already been applied in oras-go.
The question is after getting the digest, oras CLI still need to go through the archive file to upload it.
Sorry for the late response. Maybe I do not know enough about how oras works, but wouldn't the proposed solution be equivalent to supplying files as arguments but instead of including the files from the arguments the only file to include is the buffer/temporary file?
Maybe the following pseudo script will help you understand my suggestion:
uint8[64MiB] buffer;
read(into=buffer, from=stdin);
Readable inputData;
if (peek(stdio) == EOF) {
inputData = buffer;
else {
File tmpFile = tmpFileCreate();
write(to=tmpFile, from=buffer);
readAll(into=tmpFile, from=stdin);
seek(origin=START, offset=0, file=tmpFile);
inputData = tmpFile;
}
call oras push registry.example:5000 inputData # Yes the CLI is not able to accept buffers, but I hope you get what I intend to say
@ProbstDJakob Besides from the seek operation, what you described is already implemented in here.
P.S. I think this discussion has gone too far from this issue and I have created https://github.com/oras-project/oras/issues/1200 so we can continue there.
The following script is a real world example where streaming could come in handy.
We fully manage the life cycle of an OpenShift cluster via a GitLab Pipeline. When creating a cluster with the openshift-install
tool some files like terraform state and kube-configs will be created. Those files are needed during the whole life cycle of the cluster (not only in the current pipeline), thus they need to be stored persistently. In our case we use the existing GitLab registry and oras to create an image.
#!/usr/bin/env sh
set -eu
# [...] some preparations
tempDir="$(mktemp -d)"
oras pull --output "$tempDir" "$ENCRYPTED_OPENSHIFT_INSTALL_ARTIFACTS_IMAGE"
sops --decrypt --input-type binary --output-type binary "$tempDir/openshift-install-artifacts.tar.gz.enc" \
| tar -xzC "$CI_PROJECT_DIR"
rm -rf "$tempDir"
#!/usr/bin/env sh
set -eu
# [...] some preparations
oras pull --output - "$ENCRYPTED_OPENSHIFT_INSTALL_ARTIFACTS_IMAGE" \
| sops --decrypt --input-type binary --output-type binary /dev/stdin \
| tar -xzC "$CI_PROJECT_DIR"
This way there is no need to create a temporary directory and to know how the file is called within the image (not a problem for us since we named it within the same repo).
See https://github.com/oras-project/oras/issues/1200#issuecomment-1849733309
My use case is to rely on oras CLI to restore a data cache, stored as tar.gz on pull, I'd like to pipe directly the downloaded artifact to
tar xz