Open michaelwittig opened 7 years ago
Brain dump. I think that Proposal 4 has the most 'legs' maybe. But I would keep pondering for a little bit:
Well, let's look at what we do have in IAM that's constant. Is it just the ARN? E.g. something like:
arn:aws:iam::A_BUNCH_OF_INTEGERS:user/my_username
If so, that's a good place to start to try and 'generate' our userid?
So I'm thinking something, like, we take the CRC32 of that thing, modulo 'x' it (for some value of 'x' TBD), add an offset to make it not conflict with the built-in /etc/passwd users (probably an offset of 1000 would do?).
Problems:
Well, if we look just at the username - my_username
- and we assume that it's only a-z0-9, so that's base36... shit, do we still have to fit it in 2^16? I think you can only end up squeezing one more letter.
If there were a consistent way to list IAM users - like, they always show up in the same order - and if they somehow showed a gap for deleted users - we could assign uid's in order? I think the biggest issue here is deleted users; for old servers it would be fine because they would 'know' about the missing user. For new servers, who never knew about the gap, they would try and misnumber
This one is well and truly horrible, but it might be the best bet.
We attach a magical, no-permissions, valueless role or permission to each IAM user. Like, maybe an inline policy. Maybe the policy is completely blank. But the name of the policy is something like:
aws-ec2-ssh-uid-1003
And from there, all servers now know that that IAM user is always to be uid-1003? (And similar with gid?)
One note - we'd have to keep track of "last_used_uid" somewhere. Hell, maybe another IAM user, who exists just to be a counter (also has no perms?) - aws-ec2-ssh-last-uid
I also though about hashing the username (or ARN, doesn't matter in this case) but I'm not aware of a mechanism that does not come with collisions as you mentioned.
So I think we need to keep some state somewhere like Proposal 4 suggest. But instead of the IAM user I was thinking of a DynamoDB table or something. There we could easily keep track of users and their assigned uid / gid. But it makes the whole thing more complicated to setup...
I have also been thinking about this. There is nothing really usable in IAM. hashing the names will be a problem, as it's too easy to have collisions.
The best thing would be tags, but AWS did not implement tags for IAM users :(
So I could only think of:
All of this sounds like a lot of extra complexity, and if we go implement this I would strongly opt for making it optional, disabled by default. Specially since this has, in my opinion, a limited usecase (efs, running a service as a user on more then one instance where the service depends on uids)
I only had to handle this once (our jenkins machines use efs for storage. I ended up picking a uid for the jenkins user and put this uid in the userdata, and only fix the uid for the jenkins user)
Maybe we can open a feature request with AWS to allow tags on IAM users? ;-)
I'm a fan of keeping things simple. This project started as a few lines of bash :) Unstable uis and gids are documented as a limitation. My opinion: It okay to keep it like this.
I agree :)
That's where I'm leaning currently too. But it's still something we should think about; consistent uid/gid's would enable things like NFS/EFS, and there are some people out there who might think that's a big deal.
I did an experiment with a simple 'inline policy' -
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1494299234000",
"Effect": "Allow",
"Action": [
"ec2:DescribeAvailabilityZones"
],
"Resource": [
"*"
]
}
]
}
(that just allows you to list ec2 availability zones, but any simple, near-empty or meaningless policy would work).
I then attached this inline to my user using the console.
Finally, I was able to do this:
aws --profile=production iam list-user-policies --user-name brady
{
"PolicyNames": [
"fake-uid-1001"
]
}
I honestly don't think it's too awful? Hrm?
The only concern I have is that the AWS API is eventual consistent. Which means that you can not use the API to assign unique values. And it could also be that two EC2 instances are started at the same time (e.g. by an Auto Scaling Group). They will not know each other and hit the AWS API at the same time, This could also lead to messed up uids.
That would only be a problem right after adding this inline policy right?
My main concern is: how do we handle a mix? Some users have this policy, others dont. We would have to get the users, then sort on the policies, etc. or is there another solution?
It also adds a lot of api calls which means less users/instances can be managed this way.
Tricky feature.
Yeah, and to the earliest point, it might not be worth it to do at all. But let's at least keep stepping through it.
(oh, and for people who don't have the policy, when the update-users script fires, it would start assigning them in order).
The eventual consistency issue is definitely still an issue, but only at the moment where you are trying to assign a new uid. Once it's assigned, it's pretty painless on currently-existing servers as well as brand new ones - you just use it to set the uid, that's it. But if you have 'n' servers, and their cronjobs all fire at the same time - chances of some kind of conflict are definitely there. I think some of this is mitigated iff the user-listing we get back from IAM keeps the same order; so long as we keep appending new UID's in order to that, then even if two people try to do it at the same time, it might be OK? Would we still need some kind of 'marker' (fake IAM user?) so that you don't inadvertently re-use the last issued uid, or is that not an issue? E.g. if you have uid 1001, 1002, 1003, and delete 1002, that's easy, next one is 1004. But if, instead, you deleted 1003, wouldn't you end up trying to re-use 1003? Is that bad?
If eventual consistency is an issue, we might be able to use the CreateDate
attribute of the IAM user and only try and assign the uid things if the CreateDate is more than 5 minutes ago; that way most of the eventual-consistency has shaken out. (And adjust '5' appropriately, blah blah).
So I still think it's doable, and I think it'd even probably work, but I worry that if we try and go this way, we'll end up having to build a full identity-provider á la LDAP or something, and deal with nssswitch and/or a whole bunch of other craziness. Though, to be fair, we already do interact with PAM, so maybe that's not the worst thing ever....oh, I dunno.
Anyways, just wanted to at least try and think the idea through to its logical conclusion.
Just a question (You may have already thought about this), but why not use the UserId
field instead of the ARN?
$ aws iam get-user --user test-user
{
"User": {
"Path": "/",
"UserName": "test-user",
"UserId": "AIDAJWKKRHMAR7MNSONFB",
"Arn": "arn:aws:iam::<AWS ACCOUNT NUMBER>:user/test-user",
"CreateDate": "2017-01-11T21:38:23Z",
"PasswordLastUsed": "2017-01-20T17:15:10Z"
}
}
Amazon's IAM documentation also says that it would make sense to use that UserId
field to uniquely identify IAM users. See the reference identifiers section linked here
It would seem that all that would be needed is to change the get_iam_users()
function to pull that field, come up with a transform to UID space with some digest (What's the expected maximum number of users and expected collision rate?), and use that result as UID, since it should be stable for each created IAM user.
EDIT: I think that going to some type of "Create a persistent mapping object for lookups to be done" is a bit stretching the scope of this project. If that's something that is needed for a Linux cluster, you are likely better served by implementing an LDAP solution and tying PAM into that to do username lookups. That would guarantee that you have the advantage of stable UIDs and GIDs, and would be far more robust than trying to tack on a lookup solution here. It would also scale way beyond (in terms of number of users) what would likely be the expected use case for doing a direct IAM sync using the AWS CLI tools.
I would think this project is geared towards smaller use cases (about a few hundred users at the most, and please correct me if I am mistaken), so it's more a question of picking a hash function which has a collision probability curve that remains low enough through about 250 or so users.
Maybe we should find out what's the largest uid for Amazon Linux. According to https://en.wikipedia.org/wiki/User_identifier#Type this could be a bit tricky because there seem to be reserved numbers and stuff. I could check the The Linux Programming Interface book to see if there is something in there. Or maybe we find someone who nows about uids in Amazon Linux?
My recollection is that UID space from kernel 2.4 onwards is unsigned 32 bit integers. If we shift the result up by about 2000, that should cover system accounts and packaged application accounts. The only gotcha is the UID for nobody
, which is either 32767 or 65535, but that can be a standalone check.
I'll check the max UID and report back here.
Looks like the maximum is 2^32 - 2 (4294967294). My suspicion is that 2^32 - 1 is reserved for 'Invalid UID', but again, that's something that we can test for.
[root@<SYSTEM> ec2-user]# useradd -u 4294967295 test2
useradd: invalid user ID '4294967295'
[root@<SYSTEM> ec2-user]# useradd -u 4294967294 test2
[root@<SYSTEM> ec2-user]# sudo -u test2 -i
[test2@<SYSTEM> ~]$ id
uid=4294967294(test2) gid=508(test2) groups=508(test2)
[test2@<SYSTEM> ~]$
User nobody
is actually uid 99 on my Amazon Linux instance. (Haven't looked at others, but I suspect that they will be similar, since this is determined by the kernel's ability to handle UIDs of a given length)
[ec2-user@<SYSTEM> ~]$ sudo -u nobody id
uid=99(nobody) gid=99(nobody) groups=99(nobody)
At least on Amazon Linux, it looks like we start at 500 and then go upwards from there. That's in /etc/login.defs
. The max being 65534 (as 65535, or 0xFFFF is intended to mean 'invalid UID') is mainly a holdover from when UIDs were 16-bit. They haven't been that way in some time (17 years).
So as long as the digesting of the IAM UserId field puts us between about 2000 (I think that's enough safety margin) and 2^32-2 as possible values. I would think that would be a large enough field for small user groups that are natively on IAM to fit in without a high risk of collision.
Ooh - here's a kinda nutty idea -
What if we use the creation time to generate the uid?
We could set the 0-date as 1/1/2000 for simplicity's sake.
(pardon the fuzziness of the numbers, I'm doing this with a shitty calculator and not using proper date math) -
The minimum possible user ID you could ever get would be from around the creation of AWS (probably later; the creation of IAM) - call that 1/1/2006 for the sake of argument - I think that translates to something like 189,000,000 which is well within 32 bits, and well above the minimal user ID that might cause collisions.
Then we burn around 31 million uid's per year until we exhaust the 32-bit address space.
Another interesting side-effect - newer users get higher user-id's. That's kinda neat.
The only drawback? You can't create two IAM users at the same exact second, or else they'll get the same UID.
We could probably pretty easily see that inside the IAM user listing and issue an error if we caught that. The fix would be "Delete one of those IAM users and re-create it". Of course, if you're using a hardcore enough number of IAM users then you might start to find this a problem - but if you are you're probably going to want to use LDAP instead, anyways.
The crazy thing is this actually could work, and solves a lot of problems.
Eventual consistency? Doesn't matter! Whenever your user eventually shows up, so long as they were created at a unique timestamp, then that's their uid
.
This seems completely psychotic to me, but I can't seem to figure out what might break it, unless I'm severely botching my math. But, I guess....I mean, the Unix Epoch problem isn't expected to reach criticality until 2038 - and if we start at 2000 instead of 1970, we get an extra 30 years anyways!
(And if we really wanted to, we could use straight-Unix-epoch timestamps - those won't exhaust the 32-bit address space until 2038 and that's plenty of time :p )
Honestly this seems like it actually could work. Wow, that's really nuts. What do you folks think?
Technically you could use a UNIX timestamp where time_t is signed 64 bits long (You're going to truncate it anyway to unsigned 32 bit int by dropping the upper 32 bits...32-bit time_t was a signed 32 bit int, so that's the origin of the 2038 problem, unsigned 32-bit int time_t would be good until 2106), you adjust by the creation of IAM UNIX timestamp (If you want), adjust for system accounts and nobody
(Salt and pepper to taste), and you should be good out about 4.2 billion-ish seconds from the epoch (133 years and change).
So yeah, that could theoretically work.
Also just thinking out loud... In the case of using Elastic File System for mounting user home directories, this would also be dependent upon Amazon's NFSv4 Server (and the NFS Utilities on the EC2 instances) dealing with unsigned 32-bit UIDs and GIDs correctly. I don't have direct knowledge that those things have been thought of in the implementation of EFS.
Are there any other pieces of software that are in common use that depend that heavily on UIDs that would have issues with 32 bit UIDs? I don't know of any offhand...
For what it's worth...I'm managing a set of basically 20 users, half of which are various service accounts for apps that don't handle EC2 Instance IAM Profiles correctly. If I got a 32-bit hash collision on the IAM UserId fields on the accounts for the 10 actual people, I'd probably go out and buy a lottery ticket...
(If it was 100 users, I'd go run around a golf course with a 4 iron in my hand during a lightning storm. More than that? I think I'd be smart enough to be using LDAP...)
Why didn't I think of that! Pretty smart idea there.
We should try this. It can be implemented in a brach and tested with several of the services (specially EFS of course). Also, maybe it's a good idea to make some switch/config option to disable this functionality.
I like the idea, I really do.
Hey it took me 7 months to come up with it myself!!!
My only thoughts: I would prefer hashing the uid/arn (unique value) to an unsigned integer instead of the creation date where we don't have any guarantees on uniqueness. In case of an uid collision we should log this as we do at the moment with usernames > 32 characters.
I have a preference to the hash into unsigned 32 bit integer space as well, mainly due to the fact that I'm unsure about how IAM deals with users federated into IAM and their create dates (and I don't have a way to test that functionality. Is it date when they were actually created or when they first federated?)
But that may be worrying about things beyond the scope here.
As per the linked PR above I suggest to speed up the login of the users out of the scope of the sync, to establish which user is from IAM, and which one is not, the sync script already use groups, namely iam-synced-users
.
My suggestion as per the PR is ( https://github.com/widdix/aws-ec2-ssh/pull/114 ) check the group membership, and if the user is not a member of this group just fall fast.
Is this going to cause any trouble with the plan of having consistent UID's , and if so what additional indication I can use, like the existence of the sync cron job or something similar ?
The only place I see the fast fail PR, well, failing, is where the users are being created by something other than the bundled import_users.sh
script, but where the authorized_keys_command.sh
script is being used to fetch the public key from IAM. The fast fail logic seems to have as an assumption that the marker group is guaranteed to be attached to users, which may not necessarily be true in every case (e.g. user accounts being served by nss_ldap from a user provided LDAP box somewhere, where the marker group isn't configured). This may be a bit much considering
At present, this codebase doesn't guarantee a stable UID for IAM users, simply because there isn't a GUID that can be quickly hashed with low collision probability into Linux UID space. The OS itself uses unsigned 32 bit integers nowadays, but there's always the question of legacy software that expects UIDs to be shorter.
So once this question here is decided, then that would likely guide how we handle users that are not in IAM.
I don't feel too strongly, about sticking with just creation date, but I think I can lay out my arguments at least -
1) We can have a nice error when we see a collision - with a simple solution "Delete one of these two users and recreate them." I guess you could have a similar one for collisions for hash'ed ARN's - but there the message would be "come up with a different name for one of these two users" - and I don't like forcing their hand like that. 2) I'm not too worried about federation. If you're federating stuff into IAM, there's probably a good chance you have an LDAP server handy - and if you have that, you'd probably want to just use that. However, if there were a good Google Apps federation available, I would certainly want to use that, and that might not make an LDAP server available for you.
Let me do some test crc32
runs on my current IAM list to see what the actual numbers look like, to start.
Here's a little PHP script I whipped up to help play with some of these numbers - feel free to give it a whirl.
<?php
$results=`AWS_PROFILE=production aws iam list-users`;
$ans=json_decode($results);
//print_r($ans);
foreach($ans->Users AS $user) {
//print "User is: ";
//print_r($user);
$username=$user->UserName;
$userid=$user->UserId;
$arn=$user->Arn;
$createdate=$user->CreateDate;
print "User: $username hashes: ".crc32($username).", ".crc32($userid).", ".crc32($arn).", ".strtotime($createdate)."\n";
}
the keymaker project attempts to solve this issue by grabbing the UserId from IAM and converting it to an int.. https://github.com/kislyuk/keymaker/blob/master/keymaker/__init__.py#L159 this seems to work reasonably well on a small set of users
Since the latest link is outdated: https://github.com/kislyuk/keymaker/blob/721d3c0894a8c216e8b681afc26d043e9a0975a8/keymaker/__init__.py#L166-L172
I'm not sure if something similar is possible in plain bash. But we could also use some python for that. Because aws cli depends on python. So we have python as a dependency already :)
It looks like you can 'tag' users now in IAM. That'd probably be a pretty good way to give them unique uid's? The only thing you have to, kinda, worry about would be making sure you can carefully number them and not have conflicting uid
's. Everything being pretty eventually-consistent.
We do not guarantee constant uid and gid for created local users at the moment. How could we handle this?