oras-project / oras

OCI registry client - managing content like artifacts, images, packages
https://oras.land
Apache License 2.0
1.44k stars 174 forks source link

Have an option to verify the locally pulled files #1368

Closed stmlange closed 1 month ago

stmlange commented 5 months ago

What is the version of your ORAS CLI

1.1.0

What would you like to be added?

Maybe I'm missing it, but assume I have a local copy of some pulled data. Is there an option to run some sha256check or something to verify that the local copy matches with the artifacts listed in the manifest?

Why is this needed for ORAS?

The manifest can have multiple digest encoding's which can make it very tricky to manually verify if the local copy is what is equal to the remote artifact.

Are you willing to submit PRs to contribute to this feature?

qweeah commented 4 months ago

@stmlange Can you kindly explain your scenario and why the verification is needed?

stmlange commented 4 months ago

@qweeah Thanks for reaching out. The main reason why I have created this ticket is that there doesn't seem to be an (easy) option to verify/check if the locally pulled artifacts are "equal" to what the remove artifact is.

Try to answer the question: is what I have locally really what was published to remote?

Consider for example maven/gradle that publish dedicated sha1 and md5 hashsums so one can download the hashsum and verify somehow that the published thing is "correct"/"equal" to the local variant.

With oras one can have multiple digest variants encoded (https://github.com/opencontainers/image-spec/blob/main/descriptor.md#digests).

A digest can be sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b or sha512:401b09eab3c013d4ca54922bb802bec8fd5318192b0a75f201d8b372742...or whatever ORAS supports. Due to multiple supported hashalgorithms it is therefore not trivial to manually check if the downloaded artifact is actually what was published.

Running oras pull multiple times actually seems to re-download the artifact. So it may be possible that oras pull secretly checks those digests (and potentially fails if the download was not successfull), but redownloading is a waste of network resources.

qweeah commented 4 months ago

Thanks @stmlange for the detailed explanation. You can utilize oras manifest fetch generate a checksum file and use shasum -c $FILE to check it.

/cc @FeynmanZhou To validate if it's something we should add to ORAS CLI.

qweeah commented 4 months ago

@stmlange Also worth mentioning that if you want to copy an artifact in a trusted way, why not using an OCI image layout?


#  1. copy an artifact to a local folder mcr.microsoft.com/oss/kubernetes/kubectl
> oras cp mcr.microsoft.com/oss/kubernetes/kubectl:v1.28.1 -r --to-oci-layout mcr.microsoft.com/oss/kubernetes/kubectl                                                                                                               
✓ Copied  application/vnd.docker.container.image.v1+json                               1.93/1.93 kB 100.00%  571µs
  └─ sha256:919d96c9446db8f5c6cf76d98abd4c79ccfe9af241f977d87188ef3e9f6f09de
...
Copied [registry] mcr.microsoft.com/oss/kubernetes/kubectl:v1.28.1 => [oci-layout] mcr.microsoft.com/oss/kubernetes/kubectl
Digest: sha256:a01b2873f41c65aa9157baf5ec0e0beaf80e9e84bb7dfa94b081cd230b534418

# 2. cp the OCI image layout folder to some air-gap environment

# 3. pull provenance file from the copied OCI image layout folder, checksum will be verified during the pull
> oras pull --oci-layout mcr.microsoft.com/oss/kubernetes/kubectl@sha256:30019e253ab74eb3e38abae7b8997e8e60c420169
044ca9bfaf9665f54ad18bc -o in-toto
✓ Pulled      provenance.json                                                          14.9/14.9 kB 100.00%  717µs
  └─ sha256:f4740e5a3adde42224679263c7b4e76985411cb7a9504615cf1421d8afb078b5
✓ Pulled      application/vnd.oci.image.manifest.v1+json                                 682/682  B 100.00%  608µs
  └─ sha256:30019e253ab74eb3e38abae7b8997e8e60c420169044ca9bfaf9665f54ad18bc
Pulled [oci-layout] mcr.microsoft.com/oss/kubernetes/kubectl@sha256:30019e253ab74eb3e38abae7b8997e8e60c420169044ca9bfaf9665f54ad18bc
Digest: sha256:30019e253ab74eb3e38abae7b8997e8e60c420169044ca9bfaf9665f54ad18bc
stmlange commented 4 months ago

Indeed with ORAS the manual way would be to download the manifest (e.g. oras manifest fetch). The reason why I filed this issue is that I believe that you can not assume that you can check via shasum -a 256 -c $FILE or a sha256sum as this would assume the hashdigest of sha256.

I believe as per https://github.com/opencontainers/image-spec/blob/main/descriptor.md#digests oras could also have a sha512:401b09eab3c013d4ca54922bb802bec8fd5318192b0a75f201d8b372742 or a hashdigest of sha512.

qweeah commented 4 months ago

I am not an expert of checksum file but shouldn't the length of the checksum string implies the algorithm already?

qweeah commented 4 months ago

Yes I tested on my linux VM and different checksum can co-exist in the same checksum file

> cat a
123
> shasum -a 512 a >> sum
> shasum -a 256 a >> sum
> cat sum
ea2fe56bb8c1fb5ada84963b42ed71b764a74b092d75755173ade06f2f4aada9c00d6c302e185035cbe85fdff31698bca93e8661f0cbcef52cf2ff65864fd742  a
181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b  a
> shasum -c sum
a: OK
a: OK
qweeah commented 4 months ago

What I mean is although ORAS doesn't support sha512, still you may split the digest with : and only keep the latter part as checksum, shasum utility can auto detect the algorithm based on the length of the checksum string.

stmlange commented 4 months ago

Yes in general the length of the hashed string could be used to determine the algorithm. As https://github.com/opencontainers/image-spec/blob/main/descriptor.md#digests oras makes it even a bit easier as it encodes the used algo in front sha256:..., sha512:....

The ORAS Digest (https://github.com/opencontainers/image-spec/blob/main/descriptor.md#digests) can be more than just sha256:..., sha512:.... How should I tell checksum to verify a multihash+base58:... or sha256+b64u:..., or whatever other algos that are supported by ORAS?

qweeah commented 4 months ago

How should I tell checksum to verify a multihash+base58:... or sha256+b64u:..., or whatever other algos that are supported by ORAS?

The demo I give generates sha256sum and sha512sum into on sum file and shasum is able to detect it automatically.

qweeah commented 4 months ago

@stmlange You don't need to use shasum -a 256, just shasum -c is enough so the checking script won't involve the algorithm. (I have amended I earlier post and removed -a 256 from it)

stmlange commented 4 months ago

The problem remains that ORAS can encode the hash as sha256+b64u: in the manifest. There is no gurantue that everything that is encoded in the manifest is supported as hash by shasum.

Consider the multihash+base58:... or sha256+b64u:... which can't be verified with shasum easily. Hence if we really need to go down the manual validation it would be a very tedious as one needs to do different things based on the digest used in the manifest.

Consider:

$ echo "123" > a
$ shasum -a 256 a >> sum
$ sha256sum a | cut -d ' ' -f 1 | xxd -r -p | base64 >> sum

$ sha256sum -c sum
a: OK
sha256sum: WARNING: 1 line is improperly formatted
qweeah commented 4 months ago

multihash+base58:... and sha256+b64u:... are not registered in OCI spec and not supported (see a related test case of OCI digest library)

stmlange commented 4 months ago

Ok I see that only sha256:... and sha512:... are actually registered and supported algorithms https://github.com/opencontainers/image-spec/blob/main/descriptor.md#registered-algorithms.

However I still think it is not that easy (I guess sometimes even impossible) to run a shasum with just the manifest. Assume the example manifest from https://github.com/opencontainers/image-spec/blob/main/manifest.md#example-image-manifest.

It just tells us the "digest", but we don't know the filename. E.g. try

oras manifest fetch --pretty ..... | grep -o '"digest": "[^"]*' | grep -o '[^:]*$' | shasum -c --

For a shasum to work we need both filename and digest:

$ cat a
123
$ shasum -a 512 a >> sum
$ shasum -a 256 a >> sum
$ cat sum
ea2fe56bb8c1fb5ada84963b42ed71b764a74b092d75755173ade06f2f4aada9c00d6c302e185035cbe85fdff31698bca93e8661f0cbcef52cf2ff65864fd742  a
181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b  a

in theory one could workaround the issue by attaching the filenames using annotation to the manifest like:

   {
     "mediaType": "application/vnd.oci.image.layer.v1.tar",
     "size": 14189,
     "digest": "sha256:181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b",
     "annotations": {
       "org.opencontainers.image.title": "blah.blah"
     }
   }

I still think and feel that manual validation is not the way to go :-)

qweeah commented 4 months ago

Generating the checksum file is not easy with v1.1.0 but will be improved in v1.2.0. You can try the main build container, e.g. generate checksum file for mcr.microsoft.com/oss/kubernetes/kubectl@sha256:30019e253ab74eb3e38abae7b8997e8e60c420169044ca9bfaf9665f54ad18bc


> docker run ghcr.io/oras-project/oras:main manifest fetch mcr.microsoft.com/oss/kubernetes/kubectl@sha256:30019e253ab74eb3e38abae7b8997e8e60c420169044ca9bfaf9665f54ad18bc --format '{{range .content.layers}}{{if index .annotations "org.opencontainers.image.title"}}{{.digest}} {{index .annotations "org.opencontainers.image.title"}}{{println}}{{end}}{{end}}'
sha256:f4740e5a3adde42224679263c7b4e76985411cb7a9504615cf1421d8afb078b5 provenance.json
github-actions[bot] commented 2 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 30 days with no activity.