kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
107.59k stars 38.64k forks source link

Container identifiers should be globally unique #199

Closed lavalamp closed 9 years ago

lavalamp commented 9 years ago

A temporary workaround is to seed our random number generators. But really, apiserver should assign a guaranteed-unique identifier upon resource creation.

thockin commented 9 years ago

Internally we use RFC4122 UUIDs for identifying pods. Any objections to making this part of the pod setup? I guess it would really be a string (like "id") but with the strong suggestion that it be an encoded UUID.

Or we could use docker-style 256 bit randoms, but that might get confusing.

If we further lock down container names to RFC1035 labels, we can use . as the docker container name, which seems much nicer than the current dashes and underscores :)

What think?

jjhuff commented 9 years ago

Both points sound great to me. These encoded names are just plain ugly:)

Id is also a required field (network containers break otherwise), so it seems weird to have them marked as 'omitempty'. This is probably more an issue for config files than anything else.

smarterclayton commented 9 years ago

Was going to get familiar and try to fix this - sounds like the suggestion is to add "ID string" to Container, and fail PodRegistryStorage.Create() if Container.ID is empty? Or should PodRegistryStorage.Create() populate unset DesiredState.Manifest.Containers[].ID that are empty? Latter seems more flexible (server controls default UUID generation for clients)

thockin commented 9 years ago

I started in on some validation logic this morning.

It's not clear whether unique ID is something users should have to spec or not. I lean towards not. That means the master (api server?) Has to generate a uuid upon acceptance. That uuid would have to flow down to Kubelet.

Is that in the same vein as you were thinking? On Jun 25, 2014 6:35 PM, "Clayton Coleman" notifications@github.com wrote:

Was going to get familiar and try to fix this - sounds like the suggestion is to add "ID string" to Container, and fail PodRegistryStorage.Create() if Container.ID is empty? Or should PodRegistryStorage.Create() populate unset DesiredState.Manifest.Containers[].ID that are empty?

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47179178 .

smarterclayton commented 9 years ago

As an API consumer I like not having to specify things that the server can do for me - having to generate a UUID on the command line to curl a new pod into existence feels wrong. I started here but didn't pass down to kubelet yet.

vmarmol commented 9 years ago

One thing to note with global unique identifier is static containers. Today we setup cAdvisor as a static container and have no way to assign it a unique ID globally.

On Wed, Jun 25, 2014 at 7:31 PM, Clayton Coleman notifications@github.com wrote:

As an API consumer I like not having to specify things that the server can do for me - having to generate a UUID on the command line to curl a new pod into existence feels wrong. I started here https://github.com/smarterclayton/kubernetes/commit/984334f42dfcb470a6add005613330477108a146#diff-89731e89e31105da32e30576a5939fb6R149 but didn't pass down to kubelet yet.

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47181803 .

smarterclayton commented 9 years ago

Static as in "defined on each host via a config file"? Would it make sense for the Kubelet to auto assign a UUID for containers pulled from files based on the host MAC and the position in the file (or a SHA1 of the contents of the manifest plus the host MAC)?

thockin commented 9 years ago

Are you setting it up by config file?

Can we generate a new uuid when we write the config file, or do we want it to be identical across machines (google prod style)?

We cod do something like: if Kubelet finds a config without a uuid, it will assign it a uuid and log it.

Internally we go one step further and define "master space" where each configuration originator can choose an ID and then manage ids within that space, meaning the master doesn't have to generate a UUID at all just a unique masterspace + pod name.

Do we need to go that far? Uuids are notoriously human-hostile. On Jun 25, 2014 7:38 PM, "Victor Marmol" notifications@github.com wrote:

One thing to note with global unique identifier is static containers. Today we setup cAdvisor as a static container and have no way to assign it a unique ID globally.

On Wed, Jun 25, 2014 at 7:31 PM, Clayton Coleman notifications@github.com

wrote:

As an API consumer I like not having to specify things that the server can do for me - having to generate a UUID on the command line to curl a new pod into existence feels wrong. I started here < https://github.com/smarterclayton/kubernetes/commit/984334f42dfcb470a6add005613330477108a146#diff-89731e89e31105da32e30576a5939fb6R149>

but didn't pass down to kubelet yet.

Reply to this email directly or view it on GitHub < https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47181803>

.

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47182140 .

vmarmol commented 9 years ago

Yes, it is a static file we leave on the machine. I don't think we have the ability to assign it a unique ID (outside of running some custom init script). Having the Kubelet assign it a UUID seems reasonable to me.

On Wed, Jun 25, 2014 at 7:43 PM, Clayton Coleman notifications@github.com wrote:

Static as in "defined on each host via a config file"? Would it make sense for the Kubelet to auto assign a UUID for containers pulled from files based on the host MAC and the position in the file (or a SHA1 of the contents of the manifest plus the host MAC)?

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47182360 .

thockin commented 9 years ago

To be clearer on masterspace. Each new pod gets fields such as:

Masterspace: kubernetes.google.com Name: "a name unique within this masterspace". I am sultaneously trying to argue that new should be DNS compatible, which lits them to 64 lowercase alphanums, and gives names semantic meaning. Making then bad as unique IDs. Need to think more about this... On Jun 25, 2014 7:46 PM, "Tim Hockin" thockin@google.com wrote:

Are you setting it up by config file?

Can we generate a new uuid when we write the config file, or do we want it to be identical across machines (google prod style)?

We cod do something like: if Kubelet finds a config without a uuid, it will assign it a uuid and log it.

Internally we go one step further and define "master space" where each configuration originator can choose an ID and then manage ids within that space, meaning the master doesn't have to generate a UUID at all just a unique masterspace + pod name.

Do we need to go that far? Uuids are notoriously human-hostile. On Jun 25, 2014 7:38 PM, "Victor Marmol" notifications@github.com wrote:

One thing to note with global unique identifier is static containers. Today we setup cAdvisor as a static container and have no way to assign it a unique ID globally.

On Wed, Jun 25, 2014 at 7:31 PM, Clayton Coleman < notifications@github.com> wrote:

As an API consumer I like not having to specify things that the server can do for me - having to generate a UUID on the command line to curl a new pod into existence feels wrong. I started here < https://github.com/smarterclayton/kubernetes/commit/984334f42dfcb470a6add005613330477108a146#diff-89731e89e31105da32e30576a5939fb6R149>

but didn't pass down to kubelet yet.

Reply to this email directly or view it on GitHub < https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47181803>

.

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47182140 .

lavalamp commented 9 years ago

I want the container names from the different sources to have different namespaces, so there won't be any collisions. E.g., kubelet prepends/appends ".etcd" (or something) to etcd-sourced containers, ".cfg" to containers from the config file, ".http" to containers from the manifest url, etc. Then it is up to each source to stay unique.

Api server can stay unique via uuid or counting up. Config files & manifest url stay unique by humans not screwing up; kubelet rejects them otherwise.

This has the nice effect that the container name produced from a config file is predictable without having to do a lookup. This would be good for our own container vm image and anything like it.

thockin commented 9 years ago

This started structured and turned into a stream0-of-consiousness, sorry.

I agree that unique names are desirable, but I'm not sure "etcd" and "httpd" are sufficient.

The reason we have masterspace internally is because we also need to attach metadata about the master that created a pod (e.g. which cluster it thought it was in, what magic number it had (generation number) and so on).

Here's what I think I have convinced myself of, so far.

1) All pods have a required name string, which is human-friendly and DNS friendly (probably rfc1035 subdomain rather than label).

2) All pods which are running have a uuid (suggest RFC4122 but could be something else). This of this as a cluster-wide PID. It has no semantic meaning and changes whenever a pod is started on a new host.

3) The kubelet API includes this uuid. If the uuid field is not specified, the kubelet will assign a new UUID.

This does not have the property that Daniel wants - predictable container names. The problem is that you putting semantic meaning into the unique ID. There's a reason that databse best practices involve surrogate keys and that UNIX syscalls operate on PIDs rather than command names. Consider what happens if we get a phantom instance on a split network - both pods end up with the same name - not unique any more.

That said, I could maybe be convinced. If we put the rule that the pod name had to be unique within a masterspace, we could sort of punt the problem a bit, for now.

E.g.

Pod { masterspace = "k8s.mydomain.com" id = "id8675309.tims-pod" containers [ { name = "apache"

...would be created with container name apache.id8675309.tims-pod.k8s.mydomain.com

If the apiserver did not care about phantoms, it could leave off the id8675309 noise. for Google people, this should look very familiar.

I still have a vague foreboding about making the uniqueness be the master's problem, but it would get rid of the need for opaque UUIDs. Or would it? We need to handle static pods (e.g. cAdvisor). If they are allowed to collide, then some hyopthetical cluster-data aggregator can not use ID as primary key, and we will need to disambiguate queries by hostname. Blech.

What if the rule is that masterspace is optional. If not specified, kubelet will make something up derived from the source. So cAdvsor would say:

Pod {

no masterspace

id = "cadvisor" containers [ { name = "cadvisor"

...would be created with container name cadvisor.cadvisor.file.

This still sort of sucks in that you can't aggregate by masterspace, e.g. "give me a list of all containers in the cadvisor masterspace". But maybe that's better served by labels anyway. Yes, I think so.

Thoughts? I could go wither way (UUIDs or masterspace + unique name). Or does someone have other ideas? I feel like this got complicated pretty fast.

Tim

On Wed, Jun 25, 2014 at 8:09 PM, Daniel Smith notifications@github.com wrote:

I want the container names from the different sources to have different namespaces, so there won't be any collisions. E.g., kubelet prepends/appends ".etcd" (or something) to etcd-sourced containers, ".cfg" to containers from the config file, ".http" to containers from the manifest url, etc. Then it is up to each source to stay unique.

Api server can stay unique via uuid or counting up. Config files & manifest url stay unique by humans not screwing up; kubelet rejects them otherwise.

This has the nice effect that the container name produced from a config file is predictable without having to do a lookup. This would be good for our own container vm image and anything like it.

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47183465 .

lavalamp commented 9 years ago

Just to be clear, I only think that the property of predictable names is desirable for containers that come from the manifest url and maybe the container, because without that property, docker ps is unhelpful. But you've got me somewhat convinced that maybe I shouldn't care so much about that.

I don't know if we want to wait to solve naming in general before we accept a better solution than what we do currently.

thockin commented 9 years ago

I could put my weight behind either approach. The problem with UUIDs (be they RFC4122 or SHA or whatever) is that they suck for humans. The problem with !UUIDs is that we have to count on the "master" to get uniqueness right.

On Wed, Jun 25, 2014 at 10:12 PM, Daniel Smith notifications@github.com wrote:

Just to be clear, I only think that the property of predictable names is desirable for containers that come from the manifest url and maybe the container, because without that property, docker ps is unhelpful. But you've got me somewhat convinced that maybe I shouldn't care so much about that.

I don't know if we want to wait to solve naming in general before we accept a better solution than what we do currently.

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47188351 .

jjhuff commented 9 years ago

"I feel like this got complicated pretty fast." Yup:)

[I think I've read & parsed all of the comments, but I could be mistaken]

I like the idea of DNS-styled hierarchical namespaces. I'm not super worried about multiple masters scheduling over the same cluster of minions, but it does make multi-tenant masters easier (i.e. a provider is running a master, and customers run their minions). It's also something that people are used to.

I think it's also reasonable to require that users produce unique names for managing in the system.

Machine-generated unique IDs are more useful for running containers (i.e. a global PID). Having this be globally unique is really handy for things like log aggregation, etc. Why not always have the kubelet generate that? The master can learn it when it lists the running containers.

It's worth noting that the only restriction of docker names is that they are unique to the host. We can still encode useful human data (manifest+container names) in addition to the unique ID into the name...even if we only parse out the unique ID!

FWIW, I'm just polishing my ID cleanups to remove dead (or dieing) code. My goal was to normalize on docker IDs for many/most things inside the kubelet. I should have it out for review tomorrow.

jjhuff commented 9 years ago

Also, clarifying some terminology might be handy. Here's how I think of things (could easily be wrong!):

Things that confuse me:

thockin commented 9 years ago

On Wed, Jun 25, 2014 at 10:32 PM, Justin Huff notifications@github.com wrote:

"I feel like this got complicated pretty fast." Yup:)

[I think I've read & parsed all of the comments, but I could be mistaken]

I like the idea of DNS-styled hierarchical namespaces. I'm not super worried about multiple masters scheduling over the same cluster of minions, but it does make multi-tenant masters easier (i.e. a provider is running a master, and customers run their minions). It's also something that people are used to.

I'm more worried about "masters" that are config files crafted by humans without coordination. Those should not accidentally collide :)

I think it's also reasonable to require that users produce unique names for managing in the system.

Here "users" == apiserver?

Machine-generated unique IDs are more useful for running containers (i.e. a global PID). Having this be globally unique is really handy for things like log aggregation, etc. Why not always have the kubelet generate that? The master can learn it when it lists the running containers.

Yeah, this is doable (and is in fact closer to what we do internally)

It's worth noting that the only restriction of docker names is that they are unique to the host. We can still encode useful human data (manifest+container names) in addition to the unique ID into the name...even if we only parse out the unique ID!

I think it is useful, but not critical that docker-reported names be human-friendly.

FWIW, I'm just polishing my ID cleanups to remove dead (or dieing) code. My goal was to normalize on docker IDs for many/most things inside the kubelet. I should have it out for review tomorrow.

I'm keenly interested in this, and I'd like to stabilize it all ASAP. Please do not merge this change until I have had time to feedback on it. Caveat: I am part-time the rest of this week.

Reply to this email directly or view it on GitHub.

thockin commented 9 years ago

On Wed, Jun 25, 2014 at 11:07 PM, Justin Huff notifications@github.com wrote:

Also, clarifying some terminology might be handy. Here's how I think of things (could easily be wrong!):

ContainerManifest is a collection of Containers. Pods are an instance of a ContainerManifest. They can be running (or not). They are assigned to a host.

The names as defined in pkg/api are somewhat confusing. It all depends on perspective. We really have two APIs here (user -> master; master -> kubelet), and we're cramming the spec into one file. We should consider whether we really want to do that.

From Kubelet's POV ContainerManifest == Pod.

Things that confuse me:

ContainerManifests have an Id. Containers have a name. Pods have an Id. Seems like the manifest should only have a name (but this gets to the discussion above about names vs ids).

Yes, let's settle the discussion about UUIDs vs Names before we rename things. and then lets rename things.

ReplicationController has a PodTemplate which really just wraps ContainerManifest. Seems like overkill.

I think this was to leave room for growth. Kubelet should do the same and define a Pod type that has an api.ContainerManifest and maybe the UUID (if we do that).

BTW: I have a change pending (not sent yet) to do validation of a ContainerManifest - so it will reject manifests that, for example, do not have an ID. The core of it is done, but it needs a test and has some open questions. Hope to have it out tomorrow.

Reply to this email directly or view it on GitHub.

bgrant0607 commented 9 years ago

Whew! This is long. This issue started with no context about what we're trying to do.

Starting with the end: Yes, we should clean up the identifiers. Currently, every object/resource includes JSONBase, which has an ID (which probably should be Id). ContainerManifest also has an Id. Container has Name. Port has Name. I'll point out that Container and Port are not standalone resources right now. ContainerManifest is the way it is due to compatibility with the container-vm release.

What are we trying to do? I saw several things mentioned in this issue:

Both unique identifiers and human-friendly names have value. Human-friendly names can only be unique in space, not in time. We should use "Id" for the former and "Name" for the latter. Names could be used to ensure idempotent (at most once) creation, though a non-indexed resource value could be used for that, also.

The replicationController would treat "Name" in the template as a prefix and would append relatively short random numbers for uniqueness. Static pods could be treated similarly.

Label selectors should be used for set formation / aggregation.

If users are providing Names, we could also use them for DNS (#146). I'd use "domain" rather than "masterspace".

Argh, my laptop needs to be rebooted...

bgrant0607 commented 9 years ago

Continuing...

I agree we want a mechanism that permits unique id allocation by Kubelets and that doesn't require centralized and/or persistent state. I like UUIDs. There are the issues of ensuring unique MAC addresses in VMs and/or namespaces, and determinism for testing, but I think we've found ways around these issues.

I've considered not having human-friendly names for pods before and just using labels instead. Labels are predictable and human-friendly, but don't require uniqueness, and don't require concatenating lots of identifying info together in order to ensure uniqueness, which users WILL do for names (and they'll want to parameterize them). They aren't short, though. Also, I guess part of the problem is that Docker doesn't support labels. We should push on that.

Idempotence could be ensured by a client-generated cookie, such as a fingerprint/hash or PR number. It wouldn't be required for static configs.

DNS names for instances aren't super-useful if they aren't predictable and DNS-like human-friendly names aren't that friendly if they are long. Short nicknames don't have to be predictable (so they could have a short uniquifying suffix) and don't even need to be semantically relevant -- just memorable (hence Docker's silly auto-generated names, I guess).

Services need predictable DNS names, OTOH, and ports need predictable names (for DNS SRV lookup or ENV vars or whatever), and pod-relative hostnames of containers should be predictable, so they can communicate with each other, though I don't know that they actually need to be FQDNs. Other types of services (e.g., master-elected services) and groups will need predictable DNS names, also.

Is the main motivation for names for pods to be consistent across all resource types? If so, I could buy into DNS-like hierarchical names for them. I'd use something like "domain" or "namespace" instead of "masterspace".

smarterclayton commented 9 years ago

Pod-relative hostnames and stable internal names are definitely valuable - and limiting "name" to rfc1035 subdomain has been extremely valuable in practice to us on OpenShift.

The replicationController would treat "Name" in the template as a prefix and would append relatively short random numbers for uniqueness. Static pods could be treated similarly.

Quasi-uniqueness I assume?

Other types of services (e.g., master-elected services) and groups will need predictable DNS names, also.

How predictable? As a concrete example, with something like Zookeeper you need the container to have an identifier/name that is stable across restarts / reschedules in a shared config (so not a pod ID). I'd assumed you'd model this with a set of replication controllers (vs a shared controller) so you had 3 replication controllers with 1 item each, and you'd be able to either set an ENV per pod, or use the name in order to apply that. If name changes over pod instances that rules out the reuse there.

lavalamp commented 9 years ago

One last thought from me; it occurs to me that docker already generates a long container id. Perhaps we can consider using that directly as our spatial/temporal unique identifier, and use this hypothetical dns-style solution as the friendly human name. We'd need to investigate just how unique docker's id's are, and we may not want to depend on docker for that, but it would reduce the number of IDs needed.

Also, as a footnote, if we end up with both pod.ID and Manifest.ID, IMO they should be the same identifier.

jjhuff commented 9 years ago

One last thought from me; it occurs to me that docker already generates a long container id. Perhaps we can consider using that directly as our spatial/temporal unique identifier, and use this hypothetical dns-style solution as the friendly human name. We'd need to investigate just how unique docker's id's are, and we may not want to depend on docker for that, but it would reduce the number of IDs needed.

Yes, I was starting to think the same thing.

Also, as a footnote, if we end up with both pod.ID and Manifest.ID, IMO they should be the same identifier.

My only concern here is that starts to feel weird when a Manifest is used as part of a ReplicationController. I think. You end up with N pods and a Manifest each with ID fields.

lavalamp commented 9 years ago

For clarity, let me suggest the convention that "name" means a friendly mostly human readable string, possibly in the style of dns names, and that "id" means an opaque, machine generated identifier, guaranteed unique at some level of resolution. Maybe everyone except for me is already using this convention. :) But I want to talk about ids and names in general without referring to a particular implementation.

My only concern here is that starts to feel weird when a Manifest is used as part of a ReplicationController. I think. You end up with N pods and a Manifest each with ID fields.

I think, in that case, the rep. controller itself has a name, which it can use to make names for the pods it creates (prepend/append indices or something). IDs for the pods could still be generated by the apisever or wherever we decide they need to be generated. I was trying to say that since there's a 1:1 relationship between pods and manifests, we shouldn't make different IDs for each.

thockin commented 9 years ago

Trying to collect thoughts and ideas into a proposal. It's tricky considering it from all angles (users, apiserver, replication controllers, kubelet). My background is node-centric, so I am counting on people to smack me if I say something dumb for higher-levels :)

I have to run out right now, but I'm hoping we can distill the discussion into a design O(soon).

NB: We spec names as RFC 1035 compatible, though we might extend that to allow purely numeric tokens (e.g. 123.something.com) but I forget which RFC that is. Docker names allow underscores, which kubelet reserves for use in infrastructure containers.

From kubelet's point of view:

1) All pods have a namespace string, which is human-friendly and DNS friendly (rfc1035 subdomain fragment). For example: "k8s.mydomain.com". This is used to indicate the provenance of the pod. If the namespace is not specified when creating a pod, kubelet will assign a namespace derived from the source. For example, a file "/etc/k8s/cadvsor.pod" on host " xyz123.mydomain.com" might get masterspace " file-f5sxiyzpnm4hgl3dmfshm2ltn5zc44dpmqfa.xyz123.mydomain.com" (the big mess is a base32 encoding of the path).

2) All pods have a name string, which is human-friendly and DNS friendly (rfc1035 subdomain fragment). For example: "id8675309.myjob". These names are typically NOT user-provided, but are generated from infrastructure. For example, if the user asked for a job named "myjob" the apiserver must uniquify that into a pod name.

3) The namespace + name pair is assumed to be unique. This provides simple idempotency, for example when re-reading a config file.

Open: Do we need UUIDs at all? For what purpose? The only argument I can come up with is to protect against accidentally non-unique names at cluster scope (e.g. in a giant database of namespace, name, uuid, stats) you could disambiguate colliding namespace+names pairs with uuid. Other justifications?

From the apiserver's point of view:

1) The apiserver has a configured namespace.

2) All incoming pods have a name. I don't know what the formatting and uniqueness rules are here.

3) Upon acceptance, a pod is assigned a unique ID of some sort.

4) Upon assignment to a minion, the name and unique ID become the pod name.

Open: Should the unique ID persist if the pod is moved to a new minion? Maybe we need two IDs - one that sticks across moves and one that does not?

On Thu, Jun 26, 2014 at 9:46 AM, Daniel Smith notifications@github.com wrote:

For clarity, let me suggest the convention that "name" means a friendly mostly human readable string, possibly in the style of dns names, and that "id" means an opaque, machine generated identifier, guaranteed unique at some level of resolution. Maybe everyone except for me is already using this convention. :) But I want to talk about ids and names in general without referring to a particular implementation.

My only concern here is that starts to feel weird when a Manifest is used as part of a ReplicationController. I think. You end up with N pods and a Manifest each with ID fields.

I think, in that case, the rep. controller itself has a name, which it can use to make names for the pods it creates (prepend/append indices or something). IDs for the pods could still be generated by the apisever or wherever we decide they need to be generated. I was trying to say that since there's a 1:1 relationship between pods and manifests, we shouldn't make different IDs for each.

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47249802 .

bgrant0607 commented 9 years ago

Numeric host names: http://tools.ietf.org/html/rfc1123#page-13

smarterclayton commented 9 years ago

Regarding the open question: is a pod on a new minion the same as the old pod? Is a move an action that is logically part of kubernetes, or is only "remove" and "create new" available? If it's the former, it seems like the ID should be the same, if it's the later it seems like it should be different.

We have a use case for being able to move a pod from minion to minion (and any volumes that come with it) - however, since this requires the volume data on disk to be in a resting state, it's not an operation that seems to lend itself well to the replication controller (since the move is inherently stateful). So I would expect this to be managed as an operation above the replication controller, vs part of it.

One namespace per apiserver can be limiting if the namespace is automatically bound to DNS and you're dealing with very large numbers of containers, but it doesn't sound unreasonable if you are using wildcard DNS.

jjhuff commented 9 years ago

@lavalamp BTW, it looks like docker IDs are more or less just 32 random bytes: https://github.com/dotcloud/docker/blob/master/utils/utils.go#L412

They enforce them to be unique per-machine: https://github.com/dotcloud/docker/blob/master/daemon/daemon.go#L470

bgrant0607 commented 9 years ago

Unique ids disambiguate among multiple instances reusing the same name over time, such as when polling termination status.

Beware of moving unique ids. Once we do that, they are no longer unique, such as in the cases of live migration or even just preloading. If we do move ids, we should have 2 ids, the movable one at the cluster level and a non-movable one at the node level, and the movable ids would correspond to zero or more of the non-movable ids.

thockin commented 9 years ago

OK, collecting thoughts again. If this is acceptable, I (or Clayton) can write up a short .md file covering identifiers. There are still a few FIXME comments in here. Please tell me if I am capturing any of this incorrectly.

Global: We spec most names as RFC 1035/1123 compatible. Docker allows container names to include underscores, which we reserves for use by infrastructure.

From kubelet's point of view:

1) All pods have a namespace string, which is human-friendly and DNS friendly (an rfc1035/1132 subdomain). For example: "k8s.mydomain.com". This is used to indicate the provenance of the pod. If the namespace is not specified when creating a pod, kubelet will assign a namespace derived from the source of the pod. For example, a file "/etc/k8s/cadvisor.pod" on host "xyz123.mydomain.com" might get masterspace " file-f5sxiyzpnm4hgl3dmfshm2ltn5zc44dpmqfa.xyz123.mydomain.com" (the big mess is a base32 encoding of the path). (FIXME: do we need this if we have a UUID as spec'ed below?)

2) All pods have a name string, which is human-friendly and DNS friendly (an rfc1035/1132 subdomain fragment). For example: "8675309.myjob". These names are typically NOT user-provided, but are generated by infrastructure. For example, if the user asked for a job named "myjob" the apiserver must uniquify that into a pod name.

3) The namespace + name pair is assumed to be unique. This provides simple idempotency, for example when re-reading a config file.

4) When starting an instance of a pod for the first time (i.e. not restarting an existing pod), kubelet will assign an rfc4122 compatible UUID to the pod. This provides an ID that is guaranteed unique across time and space. If the pod is stopped and an identical pod (same namespace + name) is started, a new UUID will be assigned.

5) Kubelet will use the aforementioned identifiers to produce unique container names, for example "8675309.myjob.k8s.mydomain.com_7c9fc7d1-5aac-46f5-b76f-b0d8d0effe2a".

NB: The UUID is per-pod, not to be confused with Docker's own container IDs which are per-container.

From the apiserver's point of view:

1) The apiserver has a configured namespace, for example "k8s.mydomain.com".

2) All incoming pods have a name assigned by the originator of the pod (be they human users or other infrastructure). Pod names must be unique in space, but not time.

3) Upon acceptance, a pod is assigned a unique ID (FIXME: just spec it as rfc4122 again? could maybe be simpler here?). This provides an easy way to disambiguate successive pods with the same name. This ID will persist for the lifetime of the pod, across restarts and moves. (FIXME: is this really needed or is name good enough?)

4) Upon assignment to a minion, the name and unique ID become the pod name.

5) The namespace + name together must be less no more than 255 characters long.

If DNS service is to be configured automatically (not a feature yet, but has been discussed), the pod namespace + name will already be DNS compliant.

On Thu, Jun 26, 2014 at 9:46 AM, Daniel Smith notifications@github.com wrote:

For clarity, let me suggest the convention that "name" means a friendly mostly human readable string, possibly in the style of dns names, and that "id" means an opaque, machine generated identifier, guaranteed unique at some level of resolution. Maybe everyone except for me is already using this convention. :) But I want to talk about ids and names in general without referring to a particular implementation.

My only concern here is that starts to feel weird when a Manifest is used as part of a ReplicationController. I think. You end up with N pods and a Manifest each with ID fields.

I think, in that case, the rep. controller itself has a name, which it can use to make names for the pods it creates (prepend/append indices or something). IDs for the pods could still be generated by the apisever or wherever we decide they need to be generated. I was trying to say that since there's a 1:1 relationship between pods and manifests, we shouldn't make different IDs for each.

Reply to this email directly or view it on GitHub.

On Thu, Jun 26, 2014 at 3:35 PM, bgrant0607 notifications@github.com wrote:

Unique ids disambiguate among multiple instances reusing the same name over time, such as when polling termination status.

Beware of moving unique ids. Once we do that, they are no longer unique, such as in the cases of live migration or even just preloading. If we do move ids, we should have 2 ids, the movable one at the cluster level and a non-movable one at the node level, and the movable ids would correspond to zero or more of the non-movable ids.

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47288913 .

jjhuff commented 9 years ago

I think this seems like a good outline! Two comments:

  1. For config files, maybe use a hash of the file vs a base64 encoding of the path as the component of the namespace. Just seems more natural.
  2. "These names are typically NOT user-provided, but are generated by infrastructure." I'd consider being more explicit about that behavior, even if it's as simple as a "unique prefix will be added to the name". This sorta touches on the whole id/name discussion....
bgrant0607 commented 9 years ago

Another positive attribute of unique ids: they are typically more concise than names, and are therefore more efficient to use in cross references in logs.

Based on experience in Omega, I think there's no question that we need unique ids.

Also, if you play around with Docker for a short while, you'll find a pile of old objects that only have unique ids and can no longer be referenced by name. How do you query them, resurrect them, or delete them once their names have been recycled?


Is there a compelling reason to use a different kind of unique identifier in the apiserver than in the kubelet? I don't think so.

We could probably shorten it (e.g., remove the locator) when we combine it with the name to form the name for kubelet.

smarterclayton commented 9 years ago

Definitely prefer hash of something in the config file (maybe hash of the JSON of the pod's manifest or something) for use in the namsepace component.

smarterclayton commented 9 years ago

I'm happy to write this up as part of #253 (along with some more tests and the apiserver implications) once folks reach closure.

smarterclayton commented 9 years ago

One minor note regarding Docker ecosystem - "name" is currently being used in Docker for a lot of lightweight integrations (linking, sky dock dns, hostname in container). A side effect of any generated name for a Docker container is that those lightweight integrations may become more difficult for end admins. Is there a practical way to make the name appropriately unique on the minion without breaking those potential integrations (making the name a subdomain fragment by omitting '.' for instance)

brendanburns commented 9 years ago

I would also like to see us drive labels all the way down into Docker. That would enable us to abandon much of what we are encoding into the name today.

Brendan

On Fri, Jun 27, 2014, 8:39 AM, Clayton Coleman notifications@github.com wrote:

One minor note regarding Docker ecosystem - "name" is currently being used in Docker for a lot of lightweight integrations (linking, sky dock dns, hostname in container). A side effect of any generated name for a Docker container is that those lightweight integrations may become more difficult for end admins. Is there a practical way to make the name appropriately unique on the minion without breaking those potential integrations?

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47362231 .

bgrant0607 commented 9 years ago

+1 to what brendanburns@ wrote. Labels provide much cleaner solutions for many use cases.

thockin commented 9 years ago

On Thu, Jun 26, 2014 at 10:51 PM, Justin Huff notifications@github.com wrote:

I think this seems like a good outline! Two comments:

  1. For config files, maybe use a hash of the file vs a base64 encoding of the path as the component of the namespace. Just seems more natural.

This is an implementation detail - human-friendliness of docker's names is a nice-to-have not critical, I think.

  1. "These names are typically NOT user-provided, but are generated by infrastructure." I'd consider being more explicit about that behavior, even if it's as simple as a "unique prefix will be added to the name". This sorta touches on the whole id/name discussion....

Yeah, where is the line between "ID" and "name"? The apiserver COULD just send a UUID for the name, but I think we want the structure and content of a pod spec to stay largely the same throughout the entire system, at least for now. I don't think we've got any consensus on diverging it, anyway. For that reason, I think "name" makes sense. It is more human readable than not.

thockin commented 9 years ago

+1 to labels

On Fri, Jun 27, 2014 at 8:45 AM, brendanburns notifications@github.com wrote:

I would also like to see us drive labels all the way down into Docker. That would enable us to abandon much of what we are encoding into the name today.

Brendan

On Fri, Jun 27, 2014, 8:39 AM, Clayton Coleman notifications@github.com wrote:

One minor note regarding Docker ecosystem - "name" is currently being used in Docker for a lot of lightweight integrations (linking, sky dock dns, hostname in container). A side effect of any generated name for a Docker container is that those lightweight integrations may become more difficult for end admins. Is there a practical way to make the name appropriately unique on the minion without breaking those potential integrations?

Reply to this email directly or view it on GitHub < https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47362231>

.

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47363939 .

thockin commented 9 years ago

What would break on those? We're not proposing to do anything with name that is not totally valid as per standalone Docker?

On Fri, Jun 27, 2014 at 8:39 AM, Clayton Coleman notifications@github.com wrote:

One minor note regarding Docker ecosystem - "name" is currently being used in Docker for a lot of lightweight integrations (linking, sky dock dns, hostname in container). A side effect of any generated name for a Docker container is that those lightweight integrations may become more difficult for end admins. Is there a practical way to make the name appropriately unique on the minion without breaking those potential integrations?

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47362231 .

smarterclayton commented 9 years ago

It's not that the names aren't valid for Docker, its that something consuming those names (as a dns prefix a la skydock, or as a hostname) would break due to length or format on the Kube generated name. I don't think that is a blocker to the proposed naming patterns above, but it's a consideration when thinking about how other software that plays well with Docker might react.

Concrete ideas might be to reduce the generated names' length and avoid using '.' as a separator.

thockin commented 9 years ago

I would argue (for the sake of arguing) that anyone who was making assumptions about container names deserves to be broken. But yeah, let's keep an eye on it.

On Sat, Jun 28, 2014 at 2:55 PM, Clayton Coleman notifications@github.com wrote:

It's not that the names aren't valid for Docker, its that something consuming those names (as a dns prefix a la skydock, or as a hostname) would break due to length or format on the Kube generated name. I don't think that is a blocker to the proposed naming patterns above, but it's a consideration when thinking about how other software that plays well with Docker might react.

Concrete ideas might be to reduce the generated names' length and avoid using '.' as a separator.

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47439586 .

smarterclayton commented 9 years ago

Agree, naming is hard

smarterclayton commented 9 years ago

Writing up a summary md - does this belong in api/doc, DESIGN.md, or another location?

thockin commented 9 years ago

I would go for docs/identifiers.md or something

On Mon, Jun 30, 2014 at 2:48 PM, Clayton Coleman notifications@github.com wrote:

Writing up a summary md - does this belong in api/doc, DESIGN.md, or another location?

Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/199#issuecomment-47591844 .

smarterclayton commented 9 years ago

Referenced pull includes a summary of this discussion, open questions: