moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.77k stars 18.67k forks source link

Feature request: Allow swarm join with temporary expiring token #29489

Open nathanleclaire opened 7 years ago

nathanleclaire commented 7 years ago

Proposal

docker swarm join-token should give out temporary expiring use tokens in addition to the "root" tokens provided today.

These would be valid to join a swarm only for a specified duration, and would become de-activated once an expected number of nodes had joined with that token, or the TTL was reached.

Example:

$ docker swarm join-token manager -q \
    --ttl "30minutes" \
    --expect 2
SWMTMPTKN-1-1dd5wxkrb8x04yeiohem5j5m3y32ew4z9mivyjiz1h8sdrctk7-ff87lcz63x6cl3187m3r0tt2b

The token about will be valid to join manager nodes for 30 minutes, and will stop allowing nodes to join with it after 2 more nodes have joined.

Motivation

In many situations it is likely that you would want to join by passing the tokens around via channels where their transport is reasonably secure but which may, accidentally or by design, leave the information for the symmetric token hanging around somewhere not presumed to be secure forever (e.g., pass in via custom data, drop into an S3 bucket, pass around using subnet-internal HTTP, etc).

Having temporary tokens available allows for a better security story and could help automation of creating swarms by giving implementers leeway to not worry that the passed tokens must remain secure forever and/or the existing root tokens must be rotated after all anticipated nodes have joined.

In theory, this feature could potentially allow two nice other features:

  1. This could be used to construct an audit log for "join events", so that if you only allowed joining via temporary use tokens, a history of "who has joined when" is theoretically possible.
  2. Ability to "error out" failed scale events. e.g., if you have a temporary token for an event intended to add X nodes to the swarm, and only X-1 nodes have reported in by the time of the TTL, a system or administrator managing Swarm could know that something went wrong (network partition, etc.) and debug / retry it, rather then ending up in an awkward half-joined situation where the error might not even be detected.

tl;dr: It would help automation and security-concious administrators if Swarm could issue time-limited join tokens.

cc @NathanMcCauley @diogomonica @stevvooe @aaronlehmann @kencochrane @ddebroy @friism FYI

nathanleclaire commented 7 years ago

also @chungers @wfarner probably relevant to your interests

thaJeztah commented 7 years ago

Slightly related; a discussion I had with @diogomonica about one-time tokens; https://github.com/docker/docker/pull/24770#issuecomment-240880401 (which would invalidate the token once it's used)

Also related ticket; https://github.com/docker/docker/issues/26743 (Improve swarm mode for automated setu)

aaronlehmann commented 7 years ago

ping @diogomonica: What do you think about supporting temporary tokens? I think this gets into the discussion about how prescriptive our workflow should be.

Technically I don't personally see problems with adding this - it's mainly a question of how many ways we want to support joining nodes in a cluster.

nathanleclaire commented 7 years ago

@aluzzardi @stevvooe @aaronlehmann Any update on this? I think it would be really useful for the Docker for AWS and Docker for Azure projects, where we are currently running a metadata server on manager nodes that will serve the token(s) over HTTP and blacklist hosts once they have already obtained it. I'd be much more comfortable with this for a variety of reasons, probably would be useful for https://github.com/docker/infrakit too.

diogomonica commented 7 years ago

@nathanleclaire bill said that "it simplifies some things, but swarm metadata will still be needed; used to store the configuration (declared state) as a whole".

Won't this particular problem go away when infrakit takes over all deployment?

nathanleclaire commented 7 years ago

"it simplifies some things, but swarm metadata will still be needed; used to store the configuration (declared state) as a whole".

@wfarner Can you elaborate on this?

Storing/sharing desired configuration state is a lot different than protecting a token than can root your entire cluster, no?

Won't this particular problem go away when infrakit takes over all deployment?

Not necessarily -- infrakit still needs to get tokens around somehow, and having a universal-cluster-root-token flying around strikes me as somewhat a liability if we can protect it with time-based access control.

Besides:

  1. Some users of Docker swarm mode will not want to use infrakit
  2. Even if infrakit intends to rotate tokens after join/scale events, that step could fail and leave the user vulnerable
wfarner commented 7 years ago

@wfarner Can you elaborate on this? Storing/sharing desired configuration state is a lot different than protecting a token than can root your entire cluster, no?

Some context - the question asked was whether this behavior would eliminate the need for infrakit to store metadata in Swarm. I think this is useful behavior, but it doesn't eliminate infrakit's reliance on Swarm's store for metadata.

cc @chungers