opencontainers / runtime-spec

OCI Runtime Specification
http://www.opencontainers.org
Apache License 2.0
3.19k stars 541 forks source link

config: add idmap and ridmap mount options #1222

Closed cyphar closed 1 year ago

cyphar commented 1 year ago

Adding new fields for MOUNT_ATTR_IDMAP had the flaw that users specifying these fields with older runtimes would result in the fields being ignored and incorrect mounts being configured. In addition, there is no text in the specification indicating whether MOUNT_ATTR_IDMAP should be applied with AT_RECURSIVE (which matters for rbind idmapped mounts).

In retrospect, the addition of the fields should've included new (dummy) mount options that would cause errors on older runtimes. Unfortunately, we have had a runtime-spec release since then so we cannot MUST these new mount options, but we can SHOULD them.

Fixes #1216 Resolves the main issue motivating the urgency of #1219 (though that feature is still something we should have).

Fixes: https://github.com/opencontainers/runtime-spec/commit/9d1130dc3bec2d2aabeb166f12e0ca027c3f404d ("IDMapping field for mount point") Signed-off-by: Aleksa Sarai cyphar@cyphar.com

cyphar commented 1 year ago

@rata The error message is fairly ugly, but the mount will fail because no filesystem supports the flag "idmap" nor "ridmap".

... Oh actually, now that I think about it, I think this flag will be ignored for bind-mounts due to how they are implemented in runc (extra mount options for bind-mounts are ignored because that's what the kernel does). Damn. This is going to be ugly... I will need to think about this tomorrow (at least in the case of runc, I suspect we should actually error out if the mount has extra fs options that the kernel will ignore).

cyphar commented 1 year ago

This is the error you get if you add it to a regular mount (for runc):

ERRO[0000] runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount src=cgroup, dst=/sys/fs/cgroup, dstFd=/proc/self/fd/8, flags=0x20000f, data=idmap: invalid argument

But for bind-mounts there's no error.

rata commented 1 year ago

Then this is not useful as-is, at least :(

cyphar commented 1 year ago

I think we need ridmap anyway (because of #1216). I've opened a PR against runc to detect this properly (opencontainers/runc#3990):

ERRO[0000] runc run failed: invalid mount &{Source:/tmp Destination:foo Device:bind Flags:2101263 ClearedFlags:0 PropagationFlags:[] Data:idmap Relabel: RecAttr:<nil> Extensions:0 UIDMappings:[] GIDMappings:[]}: bind mounts cannot have any filesystem-specific options applied

Unfortunately this doesn't help with old versions of runc (though the same is true for #1219 because we need to update runtimes to say whether they support idmapped mounts), and I suspect other runtimes have the same behaviour because it's easy to miss this.

@giuseppe @utam0k How do crun and youki handle data flags being passed to bind-mounts? Do you return errors or are the flags silently ignored? (Try adding idmap to a bind-mount options list.)

AkihiroSuda commented 1 year ago

Adding mount options which do NOP by themselves looks clumsy.

Why not just use the mountExtensions struct?

Maybe this could be an object like:

"mountExtensions": {
  "idmap": {
    "modes": ["sameAsUserns", "arbitrary"],
    "filesystemTypes": ["ext4", "tmpfs", "foobarfs"],
    "supportsBind": true
  }
}

wdyt?

Also we need to figure out how to deal with recursive mounts -- see #1216. If we add a separate flag for this, we will need to add an entry to this struct.

_Originally posted by @cyphar in https://github.com/opencontainers/runtime-spec/pull/1219#discussion_r1301251621_

cyphar commented 1 year ago

@AkihiroSuda The features proposal doesn't help with:

Also, the flags aren't NOP flags -- they indicate you need to do a certain mount operation (just like "rro" and the other recursive mount settings).

That being said, I think we need the features addition as well.

rata commented 1 year ago

@cyphar

  • As far as I can tell, we don't have any other runtime-spec features that require you to check the features list in order to avoid a setting being silently ignored

I guess there is some other. By design, runtimes MUST ignore unknown fields. Probably in practice most of the stuff is supported for years, so not a real issue, but some example probably exists (probably also not with such awful consequences as this one).

cyphar commented 1 year ago

Hmm, now that I think about it some more, that's a good point @rata...

FWIW, I'm starting to come to the view that the way we planned for extensibility in the spec was a bit misguided -- runtimes do not reject configurations from specification versions they do not support, which is behaviour that just begging to cause a security issue in the future. The purpose of the "runtimes MUST ignore unknown fields" text was to allow for vendor extensions (I think this was also motivated by the fact that the Go stdlib JSON parser didn't have a way to block unknown fields when we first started working on the spec -- DisallowUnknownFields was added in Go 1.10 in 2018 -- which was after the runtime-spec 1.0 release in 2017). While there is an argument for allowing vendor extensions to the image-spec that tools should ignore if unsupported, I don't think the argument makes as much sense for the runtime-spec (and the image-spec has had their own issues with this when dealing with registries...). Obviously it's far too late to change any of this now...

For this particular feature, we should've caught this issue when reviewing the idmapped mounts additions to the spec.

All of that being said, we need ridmap to resolve #1216 and if we have ridmap I don't see any downside to having idmap as well.

rata commented 1 year ago

@cyphar I fully agree with that. I did a patch to runc, to fail on unknown fields, but never opened a PR as I found that note about extensibility. I completely share that that note creates more issues than it solves.

There might be some "escapes" (besides going for v2.0.0, that I don't think is a good idea), like adding a failOnUnknownFields: true somewhere in the config.json, that high-level runtimes could set (maybe also even query support for that using the features thing) and some years in the future, when all used OCI runtimes support that flag, then that would be universal. At that point, maybe high-level runtimes can just fail if that feature is not implemented, and not even worry about providing a fall-back to be safe in the face of OCI runtime unknown features.

That is a longer conversation and I'm not 100% sure what my thoughts are about it now.

giuseppe commented 1 year ago

@giuseppe @utam0k How do crun and youki handle data flags being passed to bind-mounts? Do you return errors or are the flags silently ignored? (Try adding idmap to a bind-mount options list.)

crun for bind mounts would just ignore a flag it doesn't know about.

crun already supports idmap with a slightly different meaning though. If it is specified then it uses the same mappings used for the container user namespace. I was playing with it before uidMappings/gidMappings were added to the OCI specs: https://github.com/containers/crun/pull/780

utam0k commented 1 year ago

@giuseppe @utam0k How do crun and youki handle data flags being passed to bind-mounts? Do you return errors or are the flags silently ignored? (Try adding idmap to a bind-mount options list.)

Youki ignores a flag that we don't know about. https://github.com/containers/youki/blob/f2d95ca41c1d2c83af2f87e28b31c77c835dfad6/crates/libcontainer/src/rootfs/utils.rs#L80-L121

AkihiroSuda commented 1 year ago

crun already supports idmap with a slightly different meaning though.

This PR had to be updated to avoid conflict with crun?

utam0k commented 1 year ago

Sorry for missing it. I thought it was fine, especially since there was no objection, but I should have checked.

@giuseppe The name of the mount options in crun seem to overlap, is that OK?

If the idmap option is specified then the mount is ID mapped using the container target user namespace. This is an experimental feature and can change at any time without notice. https://github.com/giuseppe/crun/blob/827b8731899d4febea3f27907bcde5e6065bb65b/crun.1.md#idmap-mount-options

cyphar commented 1 year ago

We will need to change the MUST language to allow crun's behaviour. In fact, maybe we should make crun's behaviour the standard behaviour if the mount doesn't have a mapping specified (it seems reasonable to me to default to the container mapping). I'm on vacation for the next two weeks, if someone else wants to pick this up in the meantime I'd appreciate it 😅

AkihiroSuda commented 1 year ago

↑ @giuseppe WDYT? 🙏

giuseppe commented 1 year ago

what would be the best wording to allow the crun behaviour? Just dropping the following sentence?

If supported, the runtime MUST return an error if this option is provided and either of `uidMappings` or `gidMappings` are empty or not present.
giuseppe commented 1 year ago

opened a PR, we can discuss there how it should be: https://github.com/opencontainers/runtime-spec/pull/1224