Closed rhatdan closed 3 years ago
I think there should be a way where we split the 32 bits uid/gid, so that a UID gets the additional UIDs/GIDs where the high 16 bits are equal the the UID itself.
i.e. user 1000 -> 65536000-65601535, user 1001 -> 65601536-65667071,
shadow shouldn't enforce any such policy decision like that when it comes to id space segmentation. it can make best practice recommendations, but that's it.
i'm not sure we can utilize nsswitch w/out coordinating with glibc. but if glibc did add support for new keywords (like "subuid" and "subgid"), then that seems like a design that would work.
The broader question is one about APIs. The existing glibc APIs that manage UIDs/GIDs trigger the NSS infrastructure to load and parse /etc/nsswitch.conf, this in turn loads plugins to respond to service requests in an authoritative fashion e.g. LDAP NSS module. There are no APIs that deal with subuid, subgid, or the concept of newuidmap and newgidmap setup for the guest namespace.
The existing infrastructure would require changes like this:
Is that what everyone is thinking about?
Yes this is exactly what I was thinking.
I should note that when I said "new API for managing" I meant only that we provide functions that allow you to query the existing data, but not modify that data. We don't need dyanmic assignment to be baked into glibc, that has deeper policy implications, and shadow-utils and admins can do that themselves. Likewise the LDAP admin can setup the ranges as they wish without any need to have an API that does the assignment.
this is very interesting to academic environments as well, which use RH-based distros (CERN Scientific Linux 6 / CERN Centos 7) and have shared clusters to which users can login
I have been talking about something like this with @poettering just a short while ago. On the @lxc side we allow for isolated idmaps, i.e. we have a way to give each container an idmap that is isolated from all other containers. For LXD it's easy to keep track what maps it has given away and how many it has left but it obviously becomes a problem when another container manager is using the same map. Having a central registry where we can - in a race free manner - record something along the lines of:
<container-identifier> <starting-host-uid> <range>
would be quite helpful.
I actually think we'd want something even better such that we can query:
This should see input from all active people who co-maintain @shadow-maint with @hallyn and myself. Would also good to hear what @stgraber thinks.
Lets not confuse two different things though.
We have the UID's allocated to users to for their user namespace. Then we have UID Ranges allocated for root running services that want to use User Namespace for separation.
This Issue is more about the UID's allocated for Users.
The two are conceptually identical only their permissions differ. If you use new{g,u}idmap for both user and root services than sub{g,i}id decides what you are allowed to map independent of whether you're root or not
Well I am not suggesting we use newuidmap for both. No reason to use this for a root running container engine.
We should have a central way for all container engines to register their id allocations. If it works for unpriv users it works for root as well so there's no additional work associated with this.
I must say I am not particularly convinced that /etc/subuid and /etc/subgid is such a great idea in the first place. Storing these registrations in /etc as if they were system configuration sounds wrong to me, we shouldn't do that anymore. Unless something is being reconfigured /etc should really be considered read-only, and range registration in /etc is something diametrically opposed to that, as it stores runtime information among the configuration in /etc.
In systemd we allocate user IDs dynamically at various places, including in nspawn's userns support and for DynamicUser=1 support for system services. But we never ever write this to /etc, as that's really not suitable for dynamically changing registrations. However, we do supply glibc NSS modules that make sure that whatever we register (regardless if individual uids or ranges of uids) shows up in the system's user database. And I think that's a general approach to follow when allocating user ranges: use NSS as registration database: make sure that the user and all other apps see that you own a range by making sure your users show up in NSS. This reuses existing concepts for maintaining registered ranges (as libc getpwuid() and getpwnam() will just report our entries), and is also friendly towards users, as for example "ps" will show all processes of a userns-using container as owned by your package. It also makes sure that classic user mgmt tools such as "adduser" automatically respect your uid range registrations, since they already check NSS before picking a UID anyway.
hence, from the systemd PoV: I am very sure we'll never begin using /etc/subuid and /etc/subgid, I think at this time we really shouldn't add any more static databases in /etc that need to be dynamically managed. Instead, we just pick a UID we consider suitable, check against NSS, and only use it if its not listed there yet (if it is, we pick a different UID). At the same time we make the UID we now took possession show up in NSS so that everybody else knows.
Or in other words: instead of trying to get everybody on board with sharing a new set of database files in /etc/, and then extending it for the network, just make everybody use the same (already existing) API instead (i.e. glibc NSS), and leave it up to the packages to register their ranges with it. Standardize on existing APIs rather than new files. The packages can then decide on their own how they manage their assignments and replicate them across the network.
(In case you wonder: yes, it's very easy to write an NSS module that returns for a UID x from some range a fixed user name "foobar-x", and vice versa)
Well that is exactly what this issue is about. Adding NSS support to newuidmap and newgidmap.
I must say I am not particularly convinced that /etc/subuid and /etc/subgid is such a great idea in the first place. Storing these registrations in /etc as if they were system configuration sounds wrong to me, we shouldn't do that anymore. Unless something is being reconfigured /etc should really be considered read-only, and range registration in /etc is something diametrically opposed to that, as it stores runtime information among the configuration in /etc.
I think you misunderstand these files. No one intends to use them as databases and they aren't used as such now. They are config files that statically tell you what ids a user can use.
No. I say: don't bother with uidmap/gidmap. Just use the regular NSS user/group db, and fill it through your own NSS module. I would advise podman to simply not bother with uidmap/gidmap, but just provide an NSS module that exposes the ranges it took possession of.
I think you misunderstand these files. No one intends to use them as databases and they aren't used as such now. They are config files that statically tell you what ids a user can use.
so they are an extension of the usual user database. And I argue that the usual user database should not be considered configuration. I mean, there's a good reason while all those new OS approaches (such as Atomic and stuff) try hard to find alternatives to having to write every user into /etc.
That's not orthogonal though, as you suggest. The idea is that you would want a way to allow a specific set of ids to be delegated to an unpriv user and these delegatable ranges are recorded in a central place: subid files. That's not opposing the db.
don't bother with uidmap/gidmap
That's not possible without regressing the ability of unprivileged users to create complex id mappings that have been delegated to them by the system administrator. This has also worked independently of systemd and on other systems so I wouldn't want to make this systemd's job too.
Well, i mean, you can always keep the db if you really really like to, but what I am saying is: the db that everybody should check is the existing NSS user/group database, and not subuid/subgid.
I mean, if you want to use newuidmap/newgidmap as your SUID binary of choice to configure your /proc/$PID/uid_map then by all means, go ahead, but also: everything else is fine too, and I'd not bother with telling people the they have to reg there ranges there. Instead, just let people use any tool they want, as long as they reg the ranges in the NSS user/group databases.
or in other words, I'd suggest buildah/podman to just ship their own tool to acquire a uid range (possibly with a suid binary of their own, or through ipc-based privileged separation), and make sure to register what they acquire in NSS, instead of pushing everything down to /etc/subuid + /etc/subgid, which means you can never use buildah/podman in an environment with read-only /etc...
That's not possible without regressing the ability of unprivileged users to create complex id mappings that have been delegated to them by the system administrator. This has also worked independently of systemd and on other systems so I wouldn't want to make this systemd's job too.
I think I am repeating myself here: I am proposing to use the glibc NSS user/group db as place to make registrations show up, and as place to guarantee that every package uses its own range. Nothing systemd specific in that at all. glibc is not a systemd project, and by doing that you create a solution working on all general purpose Linux systems that support NSS, and there's nothing systemd-specific about that.
or in other words, I'd suggest buildah/podman to just ship their own tool to acquire a uid range (possibly with a suid binary of their own, or through ipc-based privileged separation
You're not really suggesting that we start shipping custom suid
idmap binaries alongside every runtime when we have newidmap
to avoid just that?
Well, i mean, you can always keep the db if you really really like to, but what I am saying is: the db that everybody should check is the existing NSS user/group database, and not subuid/subgid.
Now you're dancing around the problem: We currently have a mechanism to delegate id ranges to unpriv users. The db registration is about registering ranges and that proposal is fine. But we still need a way to delegate ranges.
You're not really suggesting that we start shipping custom
suid
idmap binaries alongside every runtime when we havenewidmap
to avoid just that?
Well, if you use a suid binary that's up to you. Major distros have the goal to minimize the number of suid binaries, and in that context it might be a much better idea to use something that uses some ipc priv separation instead. But the point I am making is this: secondary databases that noone but the tool owning it check are excercises in making UID collisions happen. The problem of dynamic UID registration is not specific to userns, and the tool newuidmap with another db in /etc might not be the ultimate solution to even the userns case.
Now you're dancing around the problem: We currently have a mechanism to delegate id ranges to unpriv users. The db registration is about registering ranges and that proposal is fine. But we still need a way to delegate ranges.
do we though? why do you want static delegation of ranges at all? i mean, podman could have a tiny ipc service (or suid binary if you want) that has one operation: "pick a free uid range that is currently not defined in the NSS user database, register it there, then chown these files with them and initialize uid_map of that process with them". and there you go: everything is properly registered, fully dynamic, without collisions, without maintaining a static database, without writing to /etc...
Why maintain a static database (and propagate them through the network) when you don't have to?
I'm not opposing switching to something better. If we have a decent proposal I'm not nacking it but we need to make sure to not regress current use cases. So the subid stuff in /etc will need to stick around anyway.
On February 22, 2019 4:39:52 PM GMT+01:00, Lennart Poettering notifications@github.com wrote:
You're not really suggesting that we start shipping custom
suid
idmap binaries alongside every runtime when we havenewidmap
to avoid just that?Well, if you use a suid binary that's up to you. Major distros have the goal to minimize the number of suid binaries, and in that context it might be a much better idea to use something that uses some ipc priv separation instead. But the point I am making is this: secondary databases that noone but the tool owning it check are excercises in making UID collisions happen. The problem of dynamic UID registration is not specific to userns, and the tool newuidmap with another db in /etc might not be the ultimate solution to even the userns case.
-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/shadow-maint/shadow/issues/154#issuecomment-466439970
Yes we do. It's a use-case right now. There are tons of examples. Random one: imagine scenarios when you only want to use a use namespace and want to make sure that an unprivileged user can only request a specific mapping to make sure they can't get access to files that would be exposed as writable if they were free to request an arbitrary mapping. You want IDs to be arbitrary and I sympatize with that but they are baked into Linux and into filesystems and so long as they are we have use cases where what mapping is used matters.
On February 22, 2019 4:44:01 PM GMT+01:00, Lennart Poettering notifications@github.com wrote:
Now you're dancing around the problem: We currently have a mechanism to delegate id ranges to unpriv users. The db registration is about registering ranges and that proposal is fine. But we still need a way to delegate ranges.
do we though? why do you want static delegation of ranges at all? i mean, podman could have a tiny ipc service (or suid binary if you want) that has one operation: "pick a free uid range that is currently not defined in the NSS user database, register it there, then chown these files with them and initialize uid_map of that process with them". and there you go: everything is properly registered, fully dynamic, without collisions, without maintaining a static database, without writing to /etc...
Why maintain a static database (and propagate them through the network) when you don't have to?
-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/shadow-maint/shadow/issues/154#issuecomment-466441500
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
So what we could do is
I'll refrain from commenting on the argument between adding functionality to existing baroque privileged programs versus adding small focused standalone privileged helpers.
The problem with dynamic mapping is that you could end up with unowned files on disk if the mapping does not persist, but the on-disk file do persist. Is there a distinction to be made between a "dynamic" mapping and an "ephemeral" mapping?
Don't we have this issue now? There is no tool like ls -l
that figures out that UID=100000 is owned by dwalsh since their is an entry in /etc/subuid
dwalsh:100000:65536
This should be no different if this file is distributed from LDAP or ActiveDirectory or ...
BTW Has their been any movement on this?
Don't we have this issue now? There is no tool like
ls -l
that figures out that UID=100000 is owned by dwalsh since their is an entry in /etc/subuid
Indeed. I as an administrator can look in /etc/sub{u,g}id to see which user the "unowned" file belongs to, though it would be convenient if ls -l
also knew to look there.
dwalsh:100000:65536
This should be no different if this file is distributed from LDAP or ActiveDirectory or ...
Agreed. It would be different if the sub{u,g}id ranges were generated on the fly in an ephemeral non-deterministic manner, as happens in my understanding of how systemd DynamicUser
s work. SystemD (Systemd?) gets around this by a combo of
http://0pointer.net/blog/dynamic-users-with-systemd.html :
- Prohibit the service from creating any files/directories or IPC objects
- Automatically removing the files/directories or IPC objects the service created when it shuts down.
That approach wouldn't necessarily work for container use cases that require persistent storage, or for containers expected to survive a host reboot.
shadow shouldn't enforce any such policy decision like that when it comes to id space segmentation. it can make best practice recommendations, but that's it.
i'm not sure we can utilize nsswitch w/out coordinating with glibc. but if glibc did add support for new keywords (like "subuid" and "subgid"), then that seems like a design that would work.
@brauner so with your glibc hat on, what do you think of this? :-)
Yeah, that would make sense to me. But for that we should get @fweimer's opinion. @fweimer, if you have a few minutes over the next few days would be great to hear your thoughts. :)
@brauner ok maybe he'll be more interested in looking at code :) I'll think about doing that (though wouldn't be until after next weekend)
It's not really clear to me why this needs to live in glibc. I still think the entire feature is misdesigned and will not work in environments which traditionally use network-based NSS modules for user management. The main problems I see is the limited size (in bits) of the UID/GID space, a perceived need to tightly control UID allocations for compliance reasons, and a lack of isolation between containers that shared user IDs bring with them.
We already have sudo
and autofs
which have their own service loaders configured by /etc/nsswitch.conf
.
I wrote this as a follow up to Dan Walsh's request for some kind of movement here. I haven't moved anything, all I'm doing is trying to summarize a position from the comments and use cases. I do provide some of my own thoughts after a year of going back and forth with @fweimer about this design.
Summary:
I still don't see a strong rationale for adding a complex API and ABI to glibc that would only benefit a very narrow container-specific use case. There is a lot of value in using glibc's existing NSS infrastructure and iterating on a functioning design until we have something working that everyone agrees is meeting user requirements. At that point we could discuss standardizing it within a core library for further maintenance and better overall integration. I sketch out a design below but it's basically a library/daemon and an NSS plugin in shadow which does what everyone needs.
Details:
(1) Network-based NSS environments.
I see no reason why this wouldn't work in a network-based NSS environment. The request is that the initial subuid/subgid data query would be resolved from say LDAP or FreeIPA, and so the values used in the mapping, the derived UID/GID, would be coming from the same network-based configuration that would have otherwise been used in the lookup within the container. One hopes that these values would be self-consistent.
(2) Static vs. Dynamic allocation or Tight vs. loose control of UID allocation.
I do think there is a use case for the tight control of UID/GID allocations, and that dynamically allocating those attributes is going to cause enterprise policy compliance issues. That doesn't mean we cant support both static and dynamic assignment. We should not make decisions that exclude one of these modes of operation.
(3) Limited size of UID/GID space.
The limited size (in bits) of the UID/GID space is a limit that we cannot increase easily, and has always been there from the beginning, this is a policy issue for administrators to decide how many UIDs or GIDs a user needs within the container. Also as a policy issues it may be possible for an administrator to allow overlap, but only they know this in advance e.g. groups that don't share any physical infrastructure etc.
(4) Lack of isolation.
I agree that shared IDs bring a lack of isolation, but so do all shared mounts. I don't see this as a limiting factor. We should not artificially limit what users of our systems can do with their infrastructure.
(5) High maintenance cost and slow iteration.
The initially suggested solution, that of adding a new NSS database for subuid and subgid, would entail the addition of a generic API/ABI for subuid/subgid which would subsequently be used only for a very narrow container-specific use case (specific mapping for UID/GID in the case of CLONE_USERNS). Florian states "It's not really clear to me why this needs to live in glibc" and I echo that general sentiment. Lennart points out that all of this can be accomplished by putting an NSS service module in the container that provides everything you need.
Placing these APIs and ABIs in glibc will impose the requirements of a core library (strong backwards compatibility) and that will complicate subsequent design changes. Iterating on the design should be our first priority when designing something as new as this.
Let me expand on Florian's idea a bit with a few bullet points (@fweimer correct me if I expanded your points along the wrong lines):
... then you'd have one set of suid binaries used by any container runtimes that want to use them, one daemon/library to query for this information, one location the information comes from, still use /etc/nsswitch.conf for centralized configuration of similar service provider information.
In summary:
So, the original post by @rhatdan said:
But we are now being contacted by "enterprise" customers who use large databases of users and they want these files and UIDMap information to be handled via ldap or FreeIPA
How do they want to use this information in ldap? Given that there are 65k (-1) 65k allocations at best, would these enterprise customers need more than that? I.e. are we better off looking at using shared uid ranges which are separated using MCS (especially after selinux namespaces are completed), or something like that?
Or will the number of subuids be enough?
I'd really like to hear more about what a service across the network would need this information for. I.e if we want to launch a container on host X from some OCI image, the cluster scheduler shouldn't care about the subuids. It should be able to simply ask host X to fire off a container using the OCI image. Host X's runtime can then choose a subuid range to run the container in, using shiftfs to map in any shared files, and hand unshifted files to cluster services which are gathering results.
In any case, is anyone interested in starting POC patch for nss?
Basically Podman/Buildah is using newuidmap and newgidmap to setup User Namespace. (A container is secondary) Other tools in the future can do the same thing, that is what newuidmap and newgidmap were designed for.
Currently newuidmap and newgidmap ONLY read the local /etc/subuid and /etc/subgid files for the UIDs allowed to be used by a particular user to setup in a User Namespace.
Bottom line, I would like to extend newuidmap and newgidmap to be able to retrieve this mapping from a network datastore. Users of Podman/Buildah want to be able to distribute this information on their networks. Think universities with distributed use of containers in their environment.
Since there is a limited number of UID/GIDs and at 65536/per container you could only allocate 65536.0 ranges.
4294967296/65536
65536.0
Of course this number is probably a lot larger then required, most user containers would work fine with a couple of thousand UIDs.
I don't want to tie this tool to any other tool like MAC, since obviously users disable SELinux and other MAC tools.
I am not tied to nsswitch to do this. If newuidmap/netgidmap would talk to sssd or systemd to get this information and sssd or systemd talked to the centralized datastores like LDAP/FreeIPA, it would be fine.
Bottom line we have tools and features of the OS that take advantage of User Namespaces, that uses UIDs/GIDs. Almost all other UID/GID databases built into the base OS and more specifically shadow-utils are available via the network using sssd, LDAP, and FreeIPA, except the content of /etc/subuid and /etc/subgid.
@ebeiderman
If newuidmap/newgidmap would make a plugable interface that can be used to load a specified dynamic module, we (FreeIPA/SSSD) can do delivery of that information. SSSD also has support for a local user database and is capable to store overrides/additional attributes for each of those users, so storying subuid/subgid maps is not a problem even for the local configuration. On FreeIPA side, adding a new type of a map and associating it with a user/group is definitely doable.
From practical side, SSSD has a library, libsss_nss_idmap
, that provides a number of extended interfaces to query information about user or group from SSSD cached database. Among others, it has this function:
/**
* @brief Find original data by fully qualified name
*
* @param[in] fq_name Fully qualified name of a user or a group
* @param[out] kv_list A NULL terminate list of key-value pairs where the key
* is the attribute name in the cache of SSSD,
* must be freed by the caller with sss_nss_free_kv()
* @param[out] type Type of the object related to the given name
*
* @return
* - 0 (EOK): success, sid contains the requested SID
* - ENOENT: requested object was not found in the domain extracted from the given name
* - ENETUNREACH: SSSD does not know how to handle the domain extracted from the given name
* - ENOSYS: this call is not supported by the configured provider
* - EINVAL: input cannot be parsed
* - EIO: remote servers cannot be reached
* - EFAULT: any other error
*/
int sss_nss_getorigbyname(const char *fq_name, struct sss_nss_kv **kv_list,
enum sss_id_type *type);
Calling it would return you a set of predefined attributes known by SSSD (name, uid, gid, gecos, home dir, shell, expiration, certificate, ssh public key, email, DN and so on) if they were cached. We can add a variant of this call that would allow to explicitly ask for a specific subset of attributes -- the plumbing is already there and is used by SSSD's infopipe interface exposed over D-BUS.
It needs a bit of work to make this info available in a default configuration but nothing too fancy.
$ cat sss-idmap-test.c
#include <stdio.h>
#include <strings.h>
#include <string.h>
#include <sss_nss_idmap.h>
int main(int argc, const char** argv) {
struct sss_nss_kv *kv = NULL;
enum sss_id_type type;
int result = sss_nss_getorigbyname(argv[1], &kv, &type);
switch (result) {
case 0:
for (size_t i = 0; kv[i].key != NULL; i++) {
printf("%s: %s\n", kv[i].key, kv[i].value);
}
sss_nss_free_kv(kv);
break;
default:
printf("error: %d\n", result);
}
return result;
}
This would be fine with me, if it was ok with the Shadow-utils guys?
Does 'a pluggable interface' just mean a .so implementing a particular function?
If so then I think that sounds good to me.
@hallyn yes, a .so to implement the interface would be enough.
Thanks, unless someone else wants to do so, I'll look at implementing that in the next few days.
For SSSD users, could the subuid and subgid ranges be calculated similar to the UID and GID? That is, they are globally unique as a function calculated from the users AD SID?
No I don't think so, since there is only
a total of 4Billion uids.
@hallyn Or any sssd people, has anyone moved on this?
For sssd, ranges could be calculated for 4M users, if only 1024 uids per user were allocated.
These could be used for 0-1004,1998-2002,4998-5002,65526-65534 or similar, to cover most existing containers, or could be allocated as-needed per container. Likely no container would actually use more than 256 UIDs.
While the actual numbers of UIDs in use are very low, they end up being quite scattered; we absolutely need a number greater than 1000, so we can support root, system users, and a few user-created users above 1000. It would be interesting to go through commonly-used container images and identify the maximum UID in use by each; I suspect that around 2048 would allow most container images, but I do see 655xx UIDs occasionally (anecdotally, they seem more common than UIDs >65536 that we already do not support with the default amount of UIDs we allocate).
We are seeing a lot of excitement on the podman front running containers as not root.
We are taking advantage of User Namespace and specifically shadow-utils with /etc/subuid, /etc/subgid nad newuidmap and newgidmap.
But we are now being contacted by "enterprise" customers who use large databases of users and they want these files and UIDMap information to be handled via ldap or FreeIPA.
Has there been any thought into making this info available via nsswitch?