gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.43k stars 1.74k forks source link

Unable to add second Auth Server (update HA documentation?) #1135

Closed rrueth closed 7 years ago

rrueth commented 7 years ago

I'm trying to bring up a Highly Available deployment of Teleport with U2F. I've read through all of the documentation with regards to HA, but I'm having trouble bringing up a second auth server. Thus, I'm opening this for help understanding what I'm missing and to see if something could be added to the HA documentation to make this clearer.

So far, I am able to bring up two proxy hosts, a host for the AppID (for U2F), and a single auth server (auth001). When all of those are running, everything seems to work. But, the second I spin up the second auth server (auth002), I begin seeing the following error in my auth001 logs:

level=warning msg="conn(:40920->:3025, user=c18a6e86-bb27-4711-8b9f-2a9d7e1ef6d7.c18a6e86-bb27-4711-8b9f-2a9d7e1ef6d7) ERROR: failed auth user c18a6e86-bb27-4711-8b9f-2a9d7e1ef6d7.c18a6e86-bb27-4711-8b9f-2a9d7e1ef6d7, err: ssh: certificate signed by unrecognized authority" file="auth/tun.go:421" func="auth.(*AuthTunnel).keyAuth"

Also, after launching auth002, when I try to run sudo tctl nodes ls on auth001, I get the following error:

ERRO[0000] access denied to 'f91f764c-57c1-4db3-9607-3a4fe47ef39e': bad username or credentials file=common/tctl.go:331 func=common.Run access denied to 'f91f764c-57c1-4db3-9607-3a4fe47ef39e': bad username or credentials

I'm using the following config for both of the auth servers:

teleport:
  data_dir: /var/lib/teleport
  pid_file: /var/run/teleport.pid
  auth_servers:
    - teleport-auth.<internal.example.com>:3025
  connection_limits:
    max_connections: 1000
    max_users: 250

  log:
    output: stderr
    # Possible severity values are 'stdout', 'stderr' and 'syslog'. Possible severity values are
    # DEBUG, INFO, WARN and ERROR (default).
    severity: WARN

  storage:
    type: dynamodb
    region: us-west-2
    table_name: teleport.state

auth_service:
  enabled: "yes"
  authentication:
    type: local

    # Enable YubiKey as the second factor for authentication
    second_factor: u2f
    u2f:
      app_id: https://teleport.<example.com>/appID

      # facets should list all proxy servers.
      facets:
        - https://myproxy001.<example.com>:3080
        - https://myproxy002.<example.com>:3080
  listen_addr: 0.0.0.0:3025

  # Only instances that are part of the "teleport-node" Security Group can contact the Auth Server
  # on the appropriate port to join the Teleport cluster.
  tokens:
    - "proxy,node:<token>"

ssh_service:
  enabled: "yes"
  listen_addr: 0.0.0.0:3022

proxy_service:
  enabled: "no"

Note: \<token> and have been redacted but are real tokens and URLs =).

Is there something that I'm missing? I noticed in the example configuration file this comment:

Optional "cluster name" is needed when configuring trust between multiple auth servers. A cluster name is used as part of a signature in certificates generated by this CA.

Do I need to configure a cluster_name when launching multiple auth servers in the same cluster?

Do I need to start the auth servers with a static token used for the auth servers?

rrueth commented 7 years ago

It looks like this was due to the missing cluster_name. From reading the docs, I did not realize that the cluster_name was required for running two auth servers in the same cluster. It would be good to update the HA documentation to call this out.