Open inve1 opened 3 years ago
cc @fspmarshall
Out of interest, what happens if you do this? I'm presuming it will still only connect to the first in the list.
On us-east-2
:
tunnel_public_addr: ['proxy-nlb.elb.us-east-2.amazonaws.com:3024', 'proxy-nlb.elb.us-west-2.amazonaws.com:3024']
On us-west-2
:
tunnel_public_addr: ['proxy-nlb.elb.us-west-2.amazonaws.com:3024', 'proxy-nlb.elb.us-east-2.amazonaws.com:3024']
Yup I tried that and as you're guessing it only used the first value. Guessing this is why: https://github.com/gravitational/teleport/blob/v5.1.0/lib/service/service.go#L2402
Summary
I'm trying to set up a cluster in 2 separate AWS regions to provide HA against an AWS regional outage. The machines I'm trying to access are located in various separate locations, not on the cloud so theoretically if 1 AWS region was having issues they'd be able to connect to the healthy one
I need to use the reverse tunnel functionality due to the network environment I'm in, and that's causing my issue
Relevant information
I've deployed something very similar to the HA cluster terraform example. Basically deploy every resource there in 2 regions, and set up dynamodb replication to share state
This works fine, and when I connect machines to the cluster they show up in both places. I can add a route53 latency based record to send users of the webui to the nearest proxy ALB with a health check in case of issues. 👍
The problem I'm having is I need to use the reverse tunnel feature for all the machines I'm dealing with, since they're deployed in locations where I have no control over the network (no exposing ports to outside traffic etc)
Unfortunately, with the current version of teleport (5.1.0) the reverse tunnel discovery will only connect to one
tunnel_public_addr
, even if there are proxies with different ones in the cluster (different network load balancers in different zones). The one returned by whatever proxy can be reached first through the "global" dns is used.I've tried configuring the 2 sets of proxies to have different
tunnel_public_addr
values, since they're behind separate NLBs. (just settunnel_public_addr
to the local NLB DNS, as the aforementioned example does). The problem is the way the discovery protocol works now, it always uses the address it finds here: [1] [2]The proxy gossip messages do work and I am seeing 4 proxies listed (2 in each region) but the discovery code always connects to the tunnel address that was found in that first API call, which will be 1 load balancer that will only ever connect to 2 out of 4 proxies.
I can get around this issue by creating a single DNS record that has the IPs of both load balancers (they have 2 each, so 4 answers in this record) and set that as
tunnel_public_addr
on every proxy. I tried this and it works but it relies on resolving the record multiple times, getting "lucky" to see ips from both load balancers at some point (therefore I cannot use a route53 geographical or latency record, it'd always return the same LB to the same client) and when hitting the load balancer finding all the proxies on that region. Also you can't return the IPs of 2 aliases at the same time in route53, so I'd have to set up something to keep this record up to date 🙃It seems like a way to handle this would be to add
tunnel_public_addr
to each proxy on the Proxies list in the discoveryRequest and use that when seeking a missing proxy. I think. I went through that code trying to see if there was a way to configure this to do what I want but it doesn't seem like it (please correct me if I'm wrong!)I see @awly you worked on this last and I took a look at this this PR https://github.com/gravitational/teleport/pull/4290 .. there are comments about multiple proxies, maybe you can chime in and let me know if this is a bad idea, perhaps give me some pointers on creating a fix for this and I would be interested in contributing that
Environment
If it helps, teleport yaml on the client machine/IOT-ish device looks like this:
and on the proxies:
Thanks in advance for any help