Open ariel-miculas opened 1 year ago
The OCIv1 manifest format is specified at https://github.com/opencontainers/image-spec/blob/main/manifest.md . I think we should stick to something closer to that.
Perhaps:
$ cat oci/index.json | jq "."
{
"schemaVersion": 3,
"manifests": [
{
"digest": "sha256:6b7980a6390ed4614465ec87388856583313cf0125deab02be0256c23a3cb006",
"size": 55,
"media_type": "application/vnd.puzzlefs.image.manifest.v1",
"annotations": {
"org.opencontainers.image.ref.name": "firstimage"
}
}
],
"annotations": {}
}
$ cat oci/blobs/sha256/6b7980a6390ed4614465ec87388856583313cf0125deab02be0256c23a3cb0 | jq "."
{
"schemaVersion": 3,
"mediaType": "application/vnd.puzzlefs.image.manifest.v1",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:c45108df90c1fceb9ce8b0d9b8aa3f09f1e7e34d29ae44928ae26e259c0282ce",
"size": 1222
},
"config": {
"mediaType": "application/vnd.puzzlefs.image.metadata.v1",
"digest": "sha256:c45108df90c1fceb9ce8b0d9b8aa3f09f1e7e34d29ae44928ae26e259c0282ce",
"size": 55
},
"files": [
{
"mediaType": "application/vnd.puzzlefs.image.filedata.v1",
"digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
"size": 403
},
{
"mediaType": "application/vnd.puzzlefs.image.filedata.v1",
"digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
"size": 797
},
{
"mediaType": "application/vnd.puzzlefs.image.filedata.v1",
"digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
"size": 679
},
{
"mediaType": "application/vnd.puzzlefs.image.filedata.v1",
"digest": "sha256:a38ced06f7cf1d2b235ffa81f165924cecddac544c0d915d13cffbe47ea29b56",
"size": 561
}
],
}
Explanation:
It does seem a little weird to duplicate the information in both a json format and a custom capnproto format. What's more, the notion of a layer
in OCIv1, which is contained in a single file, doesn't map well with the puzzlefs concept of having a metadata file and multiple data files for a single layer.
We could abuse the format and make each metadata/data file a single layer, that may work for getting the existing tools to copy the files, but it doesn't seem like a good design decision.
@hallyn what do you think?
Well, failing a good idea for an alternative, let's leave it as is for now and re-open if we come up with something.
Now that I'm working on the stacker support for building PuzzleFS images, I think it's time to revisit this issue and the delta generation. Should we stick to the original OCI image manifest specification? We would need new image media types, but I'm wondering whether being close to the OCI spec would make it easier for existing tools to work with PuzzleFS images. For mounting the PuzzleFS image in kernel, we would still need to have the manifest and layers in capnp format, maybe they could coexist. We could add a new media type for the PuzzleFS layer which would point to the PuzzleFS metadata and then add support for parsing this new media type (e.g. making sure we add all the chunks pointed to by the metadata layer to the oci data store, i.e. blobs/sha256). Not sure how well the existing tools would deal with this, since there would be no references to these chunks/blobs from the usual json content descriptors, all the references would be only stored in the PuzzleFS metadata file, which is in capnp format. This approach is also hinted by Aleksa Sarai at the end of his blog post. Or we could try this model proposed by Serge. Another reason why we would want to stick close to the OCI format is to keep the OCI configuration format, which keeps information such as architecture, os, environment variables etc, which do not change if we generate a PuzzleFS image. On the other hand, the OCI format it tightly coupled with the notion of layering, which we don't want to do with PuzzleFS. Deduplication is achieved by splitting the filesystem in chunks with the CDC algorithm, and sufficiently similar images should end up sharing most of the chunks. Since PuzzleFS doesn't fit the OCI model, we might as well not care about being compatible with it. This would however complicate the addition of other features, such as support for running a PuzzleFS container. Besides, we would need to take care of generating all the relevant OCI (or inspired from OCI) metadata bits and pieces.
@tych0, @hallyn do you have any thoughts on this?
Hey, sorry for the delay.
Not sure how well the existing tools would deal with this, since there would be no references to these chunks/blobs from the usual json content descriptors, all the references would be only stored in the PuzzleFS metadata file, which is in capnp format.
I ran into this problem a bunch with tools when I did stacker's squashfs support, and filed stuff like https://github.com/opencontainers/image-spec/pull/816 in support of it. I got it all plumbed through, and hopefully did it in a way that future-proofed it for puzzlefs, so I think a new mime type is a good path forward, especially since stuff like storage and hosting (i.e. the distribution spec) makes it so that you don't have to build tooling for those parts.
Or we could try this https://github.com/project-machine/puzzlefs/issues/55#issuecomment-1355484466.
I think it's reasonable in a vacuum, but you would have to teach other tools (skopeo, dist spec) about this new format, which is kind of annoying.
On the other hand, the OCI format it tightly coupled with the notion of layering
There are two explicit mentions of layering, descriptors and history.
I think that for History, we'll still have this concept: users will build puzzlefs images by individual mutations to them (apt-get install python3
, curl https://sh.rustup.rs | sh
, cargo build myapp
, etc.), which are still "layers". It's just that the underlying fs representation won't be 1:1 with that any more, because it's more efficient. But this idea of "here's the step that generated this delta" is still reasonable, IMO.
So what's left is Descriptors, which, while called Layers in the manifest, could be "just" a list of BlobRefs. Admittedly they're not layers, but the delta is so small, and the amount of work to generate the rest of the tooling is so great, that I would lean towards just re-using the OCI spec here. Maybe we can send some clarifying PRs that "not all OCI images need be layer based" or something?
Thank you for continuing to push on this, it's awesome!
Thanks for your input, @tych0 So what you're saying is we should abuse the oci Image manifest specification so that the existing tools will copy the necessary BlobRefs that we need for Puzzlefs. It would look something like this:
"layers": [
{
"mediaType": "application/vnd.puzzlefs.image.rootfs.v1",
"digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
"size": 83518086
},
{
"mediaType": "application/application/vnd.puzzlefs.image.inodes.v1",
"digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
"size": 83518086
},
{
"mediaType": "application/vnd.puzzlefs.image.filedata.v1",
"digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
"size": 83518086
},
],
where
When mounting the image, PuzzleFS will parse the list of layers, extract the application/vnd.puzzlefs.image.rootfs.v1
manifest, and then use the information provided there to mount the image. Optionally it could compare the list of BlobRefs from the OCI Image manifest to the list of BlobRefs from the PuzzleFS manifest and metadata layers.
The main advantages would be compatibilty with existing tools and decoupling the PuzzleFS merkle tree structure from the OCI Image Manifest. The disadvantage is that we are duplicating the information in two places and formats: once in the OCI Image manifest, and once in the PuzzleFS manifest and PuzzleFS metadata layers.
Did I get this right? @mikemccracken @raharper @rchincha any thoughts on this?
Did I get this right?
Heh, I don't think I quite got it right, I had forgotten that you needed mime types for the layers. It seems like a bit of a hack, but yes, that's what I had in mind.
(Is there a reason inodes is not part of rootfs?)
I think this was the original design even when we had cbor serialization. And we do have layers in PuzzleFS right now, and that's another thing to consider when designing the OCI format of PuzzleFS.
We could include the entire PuzzleFS metadata in one single capnp file, that way we'll only have application/vnd.puzzlefs.image.rootfs.v1
and application/vnd.puzzlefs.image.filedata.v1
.
I think this was the original design even when we had cbor serialization.
Definitely a mistake then :).
And we do have layers in PuzzleFS right now, and that's another thing to consider when designing the OCI format of PuzzleFS.
Yeah, it's a good point. It's almost as if OCI's "layers" is just transport for bits, and we want to allow images to have more than just the OCI's version of Metadata, Config, and Layers.
I suppose another option is that we could add pointers as Annotations on metadata, but then tools will not automatically transport them. IMO the way you have it above is probably the best because we can use existing tooling, even if it is slightly confusing.
We could include the entire PuzzleFS metadata in one single capnp file, that way we'll only have application/vnd.puzzlefs.image.rootfs.v1 and application/vnd.puzzlefs.image.filedata.v1.
that sounds reasonable to me.
We should add a skopeo copy
integration test and then we can close this issue.
The current puzzlefs manifest format is as follows:
Whereas for oci v1, the manifest has the following format: