moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.34k stars 612 forks source link

Attach #251

Closed aluzzardi closed 7 years ago

aluzzardi commented 8 years ago

Attaching doesn't feel like a cluster management operation. Additionally, since agents connect to managers rather than the other way around, it wouldn't work seamlessly.

What if instead we provided an ssh service out of band from the cluster API to allow attach (debugging) operations?

It could either be:

aluzzardi commented 8 years ago

/cc @docker/fiesta-cucaracha-maintainers (especially @diogomonica) @ehazlett

diogomonica commented 8 years ago

Agreed with @ehazlett on the preference for having it on the manager. I'm concerned on how the SSH key gets to the client in the first place though, even with a forwarding solution.

All the uncertainty around how authentication happens to the manager further plays into this. If the user authenticates himself to the manager using SSH, we could simply put that public key in all the agents, and proxy a connection. If the user is connecting to the "engine" using TLS or local unix socket, then I'm not exactly sure how we would implement this.

stevvooe commented 8 years ago

We may want to look into a middle ground here. The solution will be much more robust and scalable if the managers coordinate authentication and port location and then the ssh connection goes directly to the box. There may exist applications that may require fast and scalable ssh access to containers.

If the box is not available on the network, then so be it, that is the design of the network. We can design a companion plugin that can proxy/forward ssh connections, but, in general, this should be offloaded from the managers. They will already be doing quite a bit of connections and cluster coordination such that responsiveness will be poor, especially if trying to debug problematic tasks on a heavily loaded cluster.

Let's take a peak at a proposed workflow from an instantiation of swarmctl ssh <task id/name>:

  1. Issues a GRPC request to manager declaring they want to make a connection to a task, including a public key obtained from ssh-agent. This may also include networking parameters or other relevant information.
  2. Managers pushes instructions down to task that instruct engine to open port, for how long and which ssh keys to allow.
  3. When the port is established, the response returns with the network and port of the task's ssh connection, public key of the task connection, which could be injected into known_hosts, and any other relevant information.
  4. swarmctl ssh then executes the client's local ssh command, if found, or uses the Go ssh library as a fallback, using the provided parameters from the request.

The primary functionality here is the introduction of an SSH redirect, which cannot be done with ssh (as far as I can tell, with good reason). Building this hook let's us be quite flexible in the future. We get a number of follow on benefits:

  1. The model is flexible. We can both proxy through the manager or do a direct connection, depending on the need.
  2. We control the propagation of the key material and we are only propagating public keys.
  3. The manager still has a hook to integrate with an external authentication system. We require valid certificates to make the connection to the manager and can do further validation of the ssh connection's private key. This is valuable for centralizing the checks compromised keys.
aaronlehmann commented 7 years ago

Closing in favor of duplicate #1896.