prominence-eosc / prominence

PROMINENCE server
Apache License 2.0
2 stars 0 forks source link

Return image digest in prominence describe #157

Open fcasson opened 2 years ago

fcasson commented 2 years ago

Useful for provenance tracking

Determine the sha256 checksum when a job is run and make that available from "prominence describe". This will have to be done in a way that works with images from (1) registries, (2) tarballs and (3) SIF files... Out of these (1) will be easy, (2) is hopefully possible but I'm not sure yet, (3) not sure if it's possible as it's not the Docker format.

fcasson commented 2 years ago

For singularity runs, an alternative identifier could be equally useful?

alahiff commented 2 years ago

Even for Docker this isn't completely trivial, see e.g. https://blog.aquasec.com/docker-image-tags

In Docker there are two sha256 values:

When you pull an image from a registry using Docker you get the digest, e.g.

# docker pull centos:centos7
centos7: Pulling from library/centos
Digest: sha256:c73f515d06b0fa07bb18d8202035e739a494ce760aa73129f60f4bf2bd22b407
Status: Image is up to date for centos:centos7
docker.io/library/centos:centos7

Using docker inspect centos:centos7 you can get see the digest (as above) as well as the Id:

[
    {
        "Id": "sha256:eeb6ee3f44bd0b5103bb561b4c16bcb82328cfe5809ab675bb17ab3a16c517c9",
        "RepoTags": [
            "centos:7",
            "centos:centos7"
        ],
        "RepoDigests": [
            "centos@sha256:9d4bcbbb213dfd745b58be38b13b996ebb5ac315fe75711bd618426a630e0987",
            "centos@sha256:c73f515d06b0fa07bb18d8202035e739a494ce760aa73129f60f4bf2bd22b407"
        ],

However, when an image is saved as a tarball the Id can be easily found from the JSON files but not the digest. That information is lost.

fcasson commented 2 years ago

If we accept a different kind of ID for the cases 1,2,3 (since they each have a different provenance chain anyway), then

1 - dockerhub -> image ID 2 - tarball -> checksum of the tarball? 3 - .sig - another file checksum, or something else?

alahiff commented 2 years ago

I agree with that. For 2 and 3 use sha256sum or even sha512sum I guess.

Using the checksum of the Docker tarball and checksum of the SIF file are probably be safer than using the image Id for case 1, as the author of the above article mentions in the comments section at the end that relying on the Id and digest doesn't actually guarantee anything.

alahiff commented 2 years ago

Each task in the execution section of the output from prominence describe (or via the REST API) now contains imageSha256 for tarball or Singularity format images:

This is not yet on the production API. Example:

[
   {
      "events":{
         "createTime":1649223722,
         "endTime":1649223737,
         "startTime":1649223723
      },
      "execution":{
         "cpu":{
            "clock":"2200.026",
            "model":"AMD Opteron 23xx (Gen 3 Class Opteron)",
            "vendor":"AuthenticAMD"
         },
         "maxMemoryUsageKB":4506476,
         "provisionedResources":{
            "cpus":1,
            "memory":1,
            "nodes":1
         },
         "site":"OpenStack-TUBITAK",
         "tasks":[
            {
               "cpuTimeUsage":0.352388,
               "exitCode":0,
               "imagePullStatus":"completed",
               "imagePullTime":10.515493154525757,
               "imageSha256":"418bc62b604b9bc3504db95324ec1e1198edb2d20d008b01f9d71ffae91eb5f1",
               "maxResidentSetSizeKB":38072,
               "retries":0,
               "wallTimeUsage":0.397615909576416
            }
         ]
      },
      "id":79884,
      "name":"",
      "resources":{
         "cpus":1,
         "disk":10,
         "memory":1,
         "nodes":1
      },
      "status":"completed",
      "tasks":[
         {
            "cmd":"hostname",
            "image":"lammps-openmpi_latest.sif",
            "runtime":"singularity"
         }
      ]
   }
]

Still need to deal with images pulled from registries. Interestingly the sha256 digests reported by udocker bear no resemblence to those reported by Docker...