project-machine / puzzlefs

Apache License 2.0
378 stars 18 forks source link

Implement delete for a given tag #116

Open ariel-miculas opened 9 months ago

ariel-miculas commented 9 months ago

Deleting a tag requires reference counting all the data blobs referenced by this particular tag. A blob can only be deleted if the tag to be deleted has the last reference to it, similar to a garbage collection algorithm. The reference counting can be done entirely in software, but if a puzzlefs store has too many tags, then it might be too slow. An alternative approach would be to keep the reference count on the disk, say in the puzzlefs image manifest. Another thing to consider is integration with zot, which has its own garbage collection mechanism.

hallyn commented 9 months ago

By

The reference counting can be done entirely in software,

I assume you mean in an external program?

In my opinion, leaving the garbage collection to an external program is ok. The external program can periodically scan the list of images for all the blobs they contain, to update the reference counts. Yes, that could lead to races between GC and adding a new image, so as you say zot may be the best place for the blob GC to happen. While anything (e.g. skopeo) can delete a tag.

ariel-miculas commented 9 months ago

I assume you mean in an external program?

No, I meant to do it in puzzlefs, but with no extra data attached to the blobs. So each time a tag is deleted, the entire puzzlefs repository is scanned, to see if the data blobs to be deleted have references from other tags. Maybe what we need to instead is integrate the Capnproto format into OCI/zot/skopeo, that way we could offload the delete to another program.

hallyn commented 9 months ago

Yes, this is why I didn't want the manifest itself being a different format - but changing that's just not really an option.

But people will use other ways of deleting images, so when you say "doing it in puzzlefs itself", if you mean adding a "puzzlefs gc", that is ok, but if you mean trying to force someone to use 'puzzlefs delete', that's just not going to fly.

ariel-miculas commented 9 months ago

You mean a different format than JSON?

hallyn commented 9 months ago
serge@jerom ~/src/puzzlefs/sample$ ../target/debug/puzzlefs build in oci zzzz
puzzlefs image manifest digest: b15f01069f03a54c68201f63099e2c52212870899106d1e473cb2643a5145e14
serge@jerom ~/src/puzzlefs/sample$ jq . < oci/index.json
{
...
    {
      "digest": "sha256:6d674fd3c8b8797cdff0e3cc3f50f707b6a9251757d568397a12b6fc887d93a5",
      "size": 272,
      "media_type": "application/vnd.puzzlefs.image.rootfs.v1",
      "annotations": {
        "org.opencontainers.image.ref.name": "zzzz"
      }
    }
...
}
serge@jerom ~/src/puzzlefs/sample$ cat oci/blobs/sha256/6d674fd3c8b8797cdff0e3cc3f50f707b6a9251757d568397a12b6fc887d93a5
%1a�B[&�i��ʂ�;�O$9��F�Kh8u�
           Pb��ΧVT�׊�<��1�1/��_6�{]rUJ�V��L9\��*}[���?\���oʂ�;�O$9��F�Kh8u�ڴWC���\B�A�Ea��}�2Ψ��7�0serge@jerom ~/src/puzzlefs/sample$

that's not json

hallyn commented 9 months ago

And no, even if it was json that wouldn't suffice - for umoci gc to work, it would have to be a proper opencontainers image-spec manifest, with []layers.

ariel-miculas commented 9 months ago

Yeah, we need figure out the integration with OCI and also reopen https://github.com/project-machine/puzzlefs/issues/55 when we have new ideas.