subid ranges sourced from the network store

rhatdan commented 5 years ago

We are seeing a lot of excitement on the podman front running containers as not root.

We are taking advantage of User Namespace and specifically shadow-utils with /etc/subuid, /etc/subgid nad newuidmap and newgidmap.

But we are now being contacted by "enterprise" customers who use large databases of users and they want these files and UIDMap information to be handled via ldap or FreeIPA.

Has there been any thought into making this info available via nsswitch?

sumit-bose commented 4 years ago

Hi,

I would prefer to store the data the the LDAP user objects of the related users.

While in theory it would be possible to let SSSD assign the ranges and there would be even room in the UID space because currently SSSD tries to avoid to assign ID between 2^31 - 2^32 (2 billion - 4 billion), I'm not sure if this would be practical.

We would need to new configuration options one which defines the allowed login UIDs, one for the range the subordinate IDs should come from and the number of subordinate IDs for each login UID and the related options for GIDs. Then each login UID can be assigned a range of subordinate UIDs and GIDs (if the range for subordinate IDs is large enough). With this config option SSSD would be able to reproducible assign the subordinate IDs even after all temporary data (e.g. SSSD's cache) are lost and it would works on multiple hosts as well as long as the same config options are used.

The pain point I can see here is how to specify the allowed login UIDs since they can be quite scattered as said before. So in the worst case this would be a list of a couple of hundreds or even thousands of UIDs (or names to make admins life easier, but if the users are coming from AD you need fully-qualified names, i.e. including the domain component, to make sure this works properly in a forest). Of course allowed login UIDs can be specified by ranges as well, but if this range gets too large there might be not sufficient space left for the subordinate ID range and the required number of subordinate IDs per allowed UID.

Given that it should more flexible and more straight forward to me to manage the subordinate ID in the LDAP user objects.

HTH

bye, Sumit

jamescassell commented 4 years ago

The pain point I can see here is how to specify the allowed login UIDs since they can be quite scattered as said before. So in the worst case this would be a list of a couple of hundreds or even thousands of UIDs (or names to make admins life easier, but if the users are coming from AD you need fully-qualified names, i.e. including the domain component, to make sure this works properly in a forest). Of course allowed login UIDs can be specified by ranges as well, but if this range gets too large there might be not sufficient space left for the subordinate ID range and the required number of subordinate IDs per allowed UID

The subuid range should be contiguous, and a fixed-size block per user. I was only referring to how that per-user block could be efficiently used by a container engine, by splitting it within a container, which is out of scope for both shadow and sssd.

AkihiroSuda commented 4 years ago

Will this relate to systemd-homed varlink API? https://systemd.io/USER_GROUP_API/

rhatdan commented 4 years ago

We have been discussing this more, and the more I think about it, I believe we need an API for these files, preferably supplied by glibc. We have been talking about the problem of getting a range of UIDs for the user from a remote site, but as we use this more and more, I realize that their is a hole in the backwards lookup, that might even be considered a security hole.

We now have files in a users homedir that the standard tools of find and ls can not identify where they came from.

If I have a file owned by UID 100002 in my homedir and run ls on it I have no idea this came from a container. Another big use case would be the audit subsystem. If a process in my user namespace goes and triggers and audit event on the system and the administrator looks to see who owned the process that created the audit event, their is no standard way to back trace that audit record back to a process owned by dwalsh.

We could end up with lots of tools building their own read to read these files. Podman, Buildah, Systemd and newuidmap, newgidmap. already do this. We could end up with findutils, coreutils, util_linux, audit, procps and any other tool that looks at process or files and wants to reverse map where who owns these objects.

At least if we had a standard library that could look up these data, I could start to bother the low level tools to start revealing this information.

fweimer commented 4 years ago

You also need to know which container created the file owned by the UID 100002 since interpretation of these UIDs is container-specific. For some tools, mapping them to a user-specific range is not enough.

I think what we eventually need is support for stacking user IDs directly in the file system. So from the host perspective, these files will have the primary user ID of the user, but once the user enters the appropriate namespace, that user ID becomes invisible and the user ID beneath is revealed. There are reservations about implementing in the kernel (due to performance and complexity), but it will happen eventually. At that point, the range mapping becomes a legacy interface, so I don't think it will see long-term use (and it takes three years or more until new glibc interfaces land in distributions, so this is really not the right venue for this).

brauner commented 4 years ago

On March 26, 2020 10:59:10 PM GMT+01:00, Florian Weimer notifications@github.com wrote:

You also need to know which container created the file owned by the UID 100002 since interpretation of these UIDs is container-specific. For some tools, mapping them to a user-specific range is not enough.

I think what we eventually need is support for stacking user IDs directly in the file system. So from the host perspective, these files will have the primary user ID of the user, but once the user enters the appropriate namespace, that user ID becomes invisible and the user ID beneath is revealed. There are reservations about implementing in the kernel (due to performance and complexity), but it will happen eventually. At that point, the range mapping

Did I miss something?

fweimer commented 4 years ago

No, you did not miss anything. But the currently proposed alternatives look so hackish to me that I just can't see that they will last for years to come.

jamescassell commented 4 years ago

Here's the part your email notification might have missed.

..." becomes a legacy interface, so I don't think it will see long-term use (and it takes three years or more until new glibc interfaces land in distributions, so this is really not the right venue for this)."

rhatdan commented 4 years ago

Well they have been working on the auditing problem for years for tracing containers, but I see this is a much more tangible problem.

I can simply to

$ podman unshare sh -c "mkdir baddir; touch baddir/badfile; chown 1:1 -R baddir"; ls -l baddir/badfile; rm baddir/badfile
-rw-r--r--. 1 100000 100000 0 Mar 27 06:02 baddir/badfile
rm: remove write-protected regular empty file 'baddir/badfile'? y
rm: cannot remove 'baddir/badfile': Permission denied

Now i have a directory and file in my homedir, that is not easily identifiable who/what created it.

rhatdan commented 4 years ago

Just running podman for a while, I find these many files in my homedir that are not owned by me.

find ~/.local/share/containers ! -uid 3267 2> /dev/null | wc -l 
25468

Had to pipe errors to /dev/null because of all of the errors in directories I could not even examine.

find: ‘/home/dwalsh/.local/share/containers/storage/overlay/cc4590d6a7187ce8879dd8ea931ffaa18bc52a1c1df702c9d538b2f0c927709d/diff/var/cache/apt/archives/partial’: Permission denied

brauner commented 4 years ago

On Thu, Mar 26, 2020 at 02:59:10PM -0700, Florian Weimer wrote:

You also need to know which container created the file owned by the UID 100002 since interpretation of these UIDs is container-specific. For some tools, mapping them to a user-specific range is not enough.

What tools do you have in mind? The question what container owns a given file doesn't make sense if you're sharing layers, directories, or mounts between containers.

I think what we eventually need is support for stacking user IDs directly in the file system.

This is not a feasible solution. You can have ten-thousands of containers with non-overlapping id mappings sharing a given directory or even a rootfs. If you're thinking about new xattrs that's not going to work because of the potential growth in size. For example, the original design for namespaced filesystem capabilities considered something like this but this was quickly hard-nacked because no-one wants really huge attributes on files. And really making this a per-filesystem attribute feels even more hacky.

brauner commented 4 years ago

On Fri, Mar 27, 2020 at 03:04:15AM -0700, Daniel J Walsh wrote:

Well they have been working on the auditing problem for years for tracing containers, but I see this is a much more tangible problem.

I can simply to
$ podman unshare sh -c "mkdir baddir; touch baddir/badfile; chown 1:1 -R baddir"; ls -l baddir/badfile; rm baddir/badfile
-rw-r--r--. 1 100000 100000 0 Mar 27 06:02 baddir/badfile
rm: remove write-protected regular empty file 'baddir/badfile'? y
rm: cannot remove 'baddir/badfile': Permission denied
Now i have a directory and file in my homedir, that is not easily identifiable who/what created it.

Parts of this are an artificial problem imho. Container runtimes usually have a well-known place where they put their files, including the rootfs and so on. So they don't just appear randomly in your home directory unless you run a container and explicitly create them somewhere as in your example or you share your home-directory with an id-mapped container. In all cases you can remove them (with a mapping sometimes). You can also run into similar problems with other tools or nfs mounts with id mappings.

But your auditing problem is about to be solved, no? Richard Guy Briggs is working on adding autid ids which would get you an audit trail for this.

rhatdan commented 4 years ago

I agree that the containers tend to be stored in the same directory. But volumes are not, they can be mounted from anywhere including /tmp. And content can be created in these directories by the non root (User UID) user.

The Auditing patch has been worked on for YEARS. I am not confident that it will get merged soon. And I am not sure this is easily mapped back to the user who launched the namespace.

Being able to examine a file or process on the system and know that it is owned by dwalsh or dwalsh(UIDS) is very important...

hallyn commented 4 years ago

@rhatdan while 'volumes' (I assume this is docker parlance, and you mean a bind mount?) can come from anywhere, i'd consider it unsafe to do that without any sort of structure. After all, if two containers are sharing the same root user subuid, then one can make a setuid exploit for another.

Furthermore, I suspect most people bind their host uid into the container. So the container can create files which appear to have been created by the user.

So, earlier you said

We could end up with lots of tools building their own read to read these files. Podman, Buildah, Systemd and newuidmap, newgidmap. already do this. We could end up with findutils, coreutils, util_linux, audit, procps and any other tool that looks at process or files and wants to reverse map where who owns these objects.

At least if we had a standard library that could look up these data, I could start to bother the low level tools to start revealing this information.

Agreed. I'm happy create a tiny libshadow or libsubuid to do the local version of this, then (once we all agree on api) we can re-visit extending to the network? I'll write a straw-man this weekend or monday.

hallyn commented 4 years ago

I've pushed a strawman api with a single function to start with at https://github.com/hallyn/shadow/commits/libsubid .

#ifndef SUBID_RANGE_DEFINED
#define SUBID_RANGE_DEFINED 1
struct subordinate_range {
    const char *owner;
    unsigned long start;
    unsigned long count;
};

enum subid_type {
    ID_TYPE_UID = 1,
    ID_TYPE_GID = 2
};

#define SUBID_NFIELDS 3
#endif

int subid_get_ranges(char *owner, struct subordinate_range ***ranges, enum subid_type which);
void subid_free_ranges(struct subordinate_range ***ranges, int num_ranges);

Let the bikeshedding begin :)

rhatdan commented 4 years ago

Well most Volumes in Podman/Docker world are bind mounts, yes. So basically doing something like

$mkdir /tmp/db; $podman run -v /tmp/db:/var/lib/mariadb mariadb

Could create content of the mariadb UID inside of the container controlled by dwalsh.

rhatdan commented 4 years ago

For your API I would prefer a function that did something lik

subid_getown_byuid(UID uid_t, char owner) subid_getown_bygid(GID gid_t, char owner)

hallyn commented 4 years ago

For your API I would prefer a function that did something lik

subid_getown_byuid(UID uid_t, char owner) subid_getown_bygid(GID gid_t, char owner)

Thanks, Dan - to be clear, you mainly mean drop the extra id_type argument?

hallyn commented 4 years ago

No, you don't ...

what's the return type of those?

giuseppe commented 4 years ago

/etc/sub*id accept both the user name and the user UID (foo:100000:65536 and 1000:100000:65536)

I think it is better if we use the UID instead of the user name. In this way the caller doesn't have to worry whether the output is a user name or a UID that must be directly parsed.

The API could be something like:

int subid_get_subuid_owner(uid_t uid, uid_t *owner);
int subid_get_subgid_owner(gid_t gid, uid_t *owner);

rhatdan commented 4 years ago

Yes I agree with Giuseppe. I need to get back which UID owns a particular file, then I can call in an translate it to a real user

ls -l containerfile.txt To be able to show that it is owned by dwalsh

hallyn commented 4 years ago

@giuseppe - why do you think it's better to accept only UID, instead of either username or UID?

I'll change the function to return an array of subordinate_id structs, ending with a NULL entry.

hallyn commented 4 years ago

Oh. You're asking for a different function. Yes, that can be added, and I expect to also add "allocate the next unused subid range of size sz to uid N".

rhatdan commented 4 years ago

Yes that would be nice also. With containers we are attempting to preallocate a huge range and then just playing in that range. 2B-4B range. But being able to do this programatically would be nice.

abbra commented 4 years ago

This will not work well with centralized identity systems. We cannot request (or grant) that from unprivileged client to give a slice of UID/GID space centrally. On the other hand, if that space is ephemeral, it can be allocated locally by SSSD and maintained there.

hallyn commented 4 years ago

@abbra I'm not quite following. Why would it not work well with centralized identity system, be it ephemeral or longer term? Of course the system should be privileged, but that's implied by its being centralized.

Perhaps if you could elaborate on how you imagine using this, it would help me understand.

hallyn commented 4 years ago

In the next few days I'll update my branch with a more complete API, and I'll post a "[WIP]" PR.

abbra commented 4 years ago

@hallyn the way I read @rhatdan's comment

With containers we are attempting to preallocate a huge range and then just playing in that range. 2B-4B range. But being able to do this programatically would be nice.

is that this happens on a runner where the container runtime is executing a container instance. That system is unprivileged from the perspective of centrally managed IdM deployment. In order to allocate the UID/GID space in the IdM, one would need to have enough privileges and it is unlikely that such privilege information could be passed from the application that issues this request through the library you are implementing.

rhatdan commented 4 years ago

There are potentially two competing use cases for the UID Mappings. The one you are concerned about is users logging in with a shared IdM for any user that logs into the system. In this case Podman/Buildah are just running in the users user namespace and need a range in order to support multiple UIDs in his homedir.

The other use case we are looking at is a root running process that is creating lots of containers (Kubernetes&CRI-O or root running # podman run --userns=auto ...) where we just want every container to run in a different user namespace for security reasons.

In the first use case, you might also want to use usernamespace for separation of containers launched by the user, but you usually have a much smaller base of UIDs to work with.

hallyn commented 4 years ago

For the first case, the login manager could create a user namespace (with privilege) on login, right?

For the second case user is already privileged.

If what you want is a shared pool from which unprivileged users simply 'borrow' uids on an ephemeral basis for each run, that would need to be something built on top of this (and would of course need to entail some way of handling clearing of all files which were created ephemerally assigned subuids on logout or end of container run).

A shared pool from which subids can be checked out without any privileged help does not belong in shadow, at least not this simply. What I'm doing right now is writing the basic library which all tools could use to query and manipulate subid allocations without stepping on each others' toes. Maybe we'll even end up creating a subuid borrowing tool in shadow. And it might work hand in hand with fsuid shifting like @brauner is working on.

Anyway I'll add the rest of the needed functions to the library, open the PR, and then we can discuss more there.

abbra commented 4 years ago

For the first case, the login manager could create a user namespace (with privilege) on login, right? For the second case user is already privileged.

So these both cases still address a single machine and that machine has no actual administrative rights for the central IdM system to add ID ranges. In centralized setup administration happens in advance and ID ranges get looked up and used on the machine.

For global, centralized storage we would expect:

have a central place to define cluster-wide ranges,
allow assigning them to users that can utilize them on any machine they allowed to use those ranges on,
have a central place to define cluster-wide ephemeral ranges,
allow local use of ephemeral ranges that do not conflict with any cluster-wide ID ranges regardless to which user they locally assigned

The step of defining and allocating those ranges would be separate from the consumption. I guess, to cover the two use cases mentioned by @rhatdan we would basically need to be able to look up the ranges for user at login for one and have a shared 'borrowed' ephemeral range allowed for use on the machine.

The first part can be added in FreeIPA in a way similar to how we added SELinux policies: there are rules that associate SELinux policy with user for a host/hostgroup and they get applied on login. The difference with SELinux policies is that SELinux contexts were pre-defined by the distribution and here we want to have semi-dynamic allocation but that's a technical difference, conceptually we deal with a similar beast.

User-specific ranges defined in advance, may be on a user/privileged account request, based on the total known state of ID ranges in the centralized system. A space is carved out and associated with the user in FreeIPA once -- this might be kind of self-service 'allocate-once' style for the user, then consumed by the login process everywhere.

Ephemeral ranges can be defined by admins for the whole centralized environment. They would be pulled by SSSD and thus available for local use -- if we guarantee they only used on the same machine, that should be fine. For cross-machine (NFS/SMB/etc) consumption we need user-specific ranges, I guess.

So, if a user doesn't have a range allocated, at login we can ask for one if user has been authenticated in a way that could be presented to FreeIPA, then ask for own range. This could happen behind the scenes but we need to have a context or a token that could be used by the backend (SSSD?) to request this allocation on behalf of the original user...

hallyn commented 4 years ago

That all sounds good and can all be done on top of libsubid. But again, if subid ranges are assigned for just one login session at a time, then the files created during a login session will need to be taken care of at logout. This could get interesting.

rhatdan commented 4 years ago

I would think the range/per user/machine would be permanent, Since content can be added to the users home-directory. Only to be freed when the user account is removed. Similar to standard handling of UIDs.

Advanced feature in future would be if a privileged process could register a range of UIDs with sssd for a particular container. Or to have a way to register a call back from the greater range

ls -l foobar Indicates that this file is owned by the CTR3 container.

hallyn commented 4 years ago

Then it sounds like we are agreed.

giuseppe commented 4 years ago

The API could be something like:

int subid_get_subuid_owner(uid_t uid, uid_t *owner);
int subid_get_subgid_owner(gid_t gid, uid_t *owner);

@pixelb would such API work for coreutils? Do you think it will be possible to extend coreutils tools to also include sub IDs ownership?

hallyn commented 4 years ago

currently it is:

struct subordinate_range **get_subuid_ranges(char *owner);
struct subordinate_range **get_subgid_ranges(char *owner);
void subid_free_ranges(struct subordinate_range **ranges);

int get_subuid_owners(uid_t uid, uid_t **owner);
int get_subgid_owners(uid_t uid, uid_t **owner);

And tonight I'm going to add a "reserve a new range for uid" call and then open a PR

lukasheinrich commented 4 years ago

just creating a link to this PR https://github.com/shadow-maint/shadow/pull/250 that needs review.

hallyn commented 4 years ago

Hm, I'm going to re-open this briefly. What do we want to do in terms of forwarding these over the network? Just leave it up to all individual callers/users? Provide a dbus service in shadow to wrap it? Something else?

hallyn commented 4 years ago

(Looking back at the opening comment)

So let's say we create a new nss 'idmap' database to do owner <-> subid translations. The first could use libsubid to return local results, then a second could query over the network, i guess? How do we want that to look? Query ldap in some new agreed upon way?

rhatdan commented 4 years ago

Well the original request was to make this data distributable across the network. Second request was to get a library so that we could translate these UIDs/GIDs on disk/logs back to the OWNERs of the objects.

abbra commented 4 years ago

Right, @hallyn I think it is the other way around -- if you'd provide a way to plug into libsubid via a plugin, then we can supply a plugin that would use SSSD to deliver the data pullable from a centralized place.

hallyn commented 4 years ago

@abbra what are you looking for to help plug libsubid ? golang bindings? I'm not quite following...

abbra commented 4 years ago

@hallyn as I said, it is the other way around -- I'm looking at ability to have a separate plugin inside libsuid that can provide ID mapping information, like we discussed in this issue. Right now your code only looks up the data from files in the file system. We discussed that and you agreed in https://github.com/shadow-maint/shadow/issues/154#issuecomment-574978728.

hallyn commented 4 years ago

(I'm not trying to argue, just not getting what you are saying). You had said:

If newuidmap/newgidmap would make a plugable interface that can be used to load a specified dynamic module, we (FreeIPA/SSSD) can do delivery of that information

o which I replied

Does 'a pluggable interface' just mean a .so implementing a particular function?

If so then I think that sounds good to me.

That's what libsubid provides.

Are you asking for an nss module? A new nss database?

hallyn commented 4 years ago

Well the original request was to make this data distributable across the network. Second request was to get a library so that we could translate these UIDs/GIDs on disk/logs back to the OWNERs of the objects.

libsubid helps with both. It will tell you the owner(s) of a particular subid, and will tell you the ranges owned by a particular user.

This way, others can more easily expose the information over the network. I hadn't yet decided whether shadow should do that itself. But I think I'm ready to give in and do it over NSS. Would that suffice for your needs?

abbra commented 4 years ago

My understanding is that an application would link against libsubid to get information about sub IDs and to ask for allocating them. libsubid wouldn't know how to provide this information from a remote source and I don't think you should be implementing that yourself. This is what I want to plug to change so that in a centrally managed environment a plugin dynamically loaded by libsubid would redirect the operations to a centrally managed source instead of files.

More specifically, the API you have is currently only backed by lib/subordinateio.c which only implements /etc/subuid and /etc/subgid processing. The API is very much file oriented (lock/unlock, *_file_present(), etc) and cannot be amended to redirect a request, say, to SSSD for lookup/modification.

I think this discussion already went through NSS module option and an agreement was to have one. So may be let's start with that, do an API implementation that loads modules specified in /etc/nsswtich.conf for subid: ... and have /etc/subuid / /etc/subgid handled via a files provider in an NSS module provided by the shadow. Then SSSD and systemd could provide their NSS modules to complement the information and we'll work between SSSD and FreeIPA on how to store and retrieve the information from FreeIPA LDAP store.

hallyn commented 4 years ago

Ok, I see. I thought that SSSD, or someone, was going to use libsubid to implement the network visibility.

I'll think through how best to write the module and ship it in shadow.

alexey-tikhonov commented 4 years ago

Ok, I see. I thought that SSSD, or someone, was going to use libsubid to implement the network visibility.

I'll think through how best to write the module and ship it in shadow.

Please take a note, besides extension of libsubid with ability to support pluggable backends, it is also required to rework newgidmap / newuidmap (and other user facing tools if any) to make use of libsubid instead of files-backed lib/subordinateio (which should rather be used as a basis for "files" plugin).

alexey-tikhonov commented 4 years ago

If I have a file owned by UID 100002 in my homedir and run ls on it I have no idea this came from a container. We could end up with lots of tools building their own read to read these files. Podman, Buildah, Systemd and newuidmap, newgidmap. already do this. We could end up with findutils, coreutils, util_linux, audit, procps and any other tool that looks at process or files and wants to reverse map where who owns these objects.

At least if we had a standard library that could look up these data, I could start to bother the low level tools to start revealing this information.

Is it ever realistic to make all those tools ("findutils, coreutils, util_linux, audit, procps and any other tool that looks at process or files") using new shadow-utils API (in addition to glibc nss) to resolve issue of "objects owned by unknown UID/GID"?

(I'm not sure if this is a sane idea but) could shadow-utils just (additionally) provide new libnss_shadow.so (backed by libsubid) that would be added to "/etc/nsswitch.conf:passwd,group" as a last resort and would serve getpwuid() calls for sub-ids returning owner id? In your exapmle "UID=100000 is owned by dwalsh" it would be: getpwuid(100000)->pw_name == "dwalsh" This, of course, would be a little bit weird since getpwuid(uid)->pw_uid != uid but IIUC this way no change is required to ls/etc to show proper owner names of objects owned by ids from sub-ranges...

rhatdan commented 4 years ago

I would want the UID to show something like dwalsh-userns for the user, or some other indicator to say which user namespace is the owner of this file.

shadow-maint / shadow

subid ranges sourced from the network store #154