tailscale / terraform-provider-tailscale

Terraform provider for Tailscale
https://registry.terraform.io/providers/tailscale/tailscale
MIT License
264 stars 47 forks source link

Discussion: non-deterministic device name? #75

Open rsyring opened 2 years ago

rsyring commented 2 years ago

I'm not exactly sure where this belongs or what I'm asking for, but figured I should note it somewhere. I've noticed in my efforts to get Tailscale installed automatically on a host and then use that device for further work in Terraform, that the device name used by Tailscale is not guaranteed.

In Terraform, I might want to create a server "enterprise" in AWS, set it's hostname as "enterprise", install & authorize Tailscale as part of the instance's first-boot configuration, and then wait_for (#72) the device so I can use it's IP to setup a DNS record for that host.

But, if there is already a device in Tailscale named "enterprise" then it looks like Tailscale will create the device as "enterprise1" and happily move on. That obviously breaks things if then use tailscale_device with name as enterprise.example.com.

I've not yet tried to see Tailscale's behavior is any different using tailscale up --hostname. Also, at least in my case, the impact of this is lessened if deletions (#68) become supported.

rsyring commented 2 years ago

Could be related if the resolution to this is a stable/deterministic ID for a host that can be used in Terraform: https://github.com/tailscale/tailscale/issues/1532

davidsbond commented 2 years ago

Would you be better supported if the terraform provider allowed you to query devices based on an alternative field than name? The provider could allow you to search based on hostname so you get more deterministic behaviour when you use tailscale up --hostname?

rsyring commented 2 years ago

@davidsbond thanks again for the time you are taking to discuss all of this. Since we don't have a real provider that is creating the device and, under the hood, returning that device's information from the create API call, we have to lookup the device somehow, some way in a GET API call. That, obviously, requires an identifier.

I think the only way that a different way to lookup the device, i.e. a field other than name, would be if tailscale up gave an option to accept an identifier and used it to create the device. That identifier would then need to be exposed through the API.

As far as I can tell, giving some other field just doesn't matter b/c that value is not known ahead of time and can't be used to identify the device when using tailcale_device.

davidsbond commented 2 years ago

@davidsbond thanks again for the time you are taking to discuss all of this.

No problem, thanks for being a user and helping improve the provider.

Since we don't have a real provider that is creating the device and, under the hood, returning that device's information from the create API call, we have to lookup the device somehow, some way in a GET API call. That, obviously, requires an identifier.

Can the value to the --hostname flag not be that identifier currently? The documentation for the flag currently states:

hostname to use instead of the one provided by the OS

So in theory, you can provide anything that could be a valid hostname in this field and it will accept it. So you could obtain the specific device using the hostname you provide the tailscale up command if the tailscale_device data source is updated to allow you to query on that field.

As far as I can tell, giving some other field just doesn't matter b/c that value is not known ahead of time and can't be used to identify the device when using tailcale_device.

Why can't it be known ahead of time if you can provide it to the tailscale up command? IIRC you mentioned previously about AWS instances where you were providing a bootstrapping script. I'm more familiar with GCP VMs, but they provide functionality that allows you to pass metadata into the VM which you can access from scripts. Assuming the same is possible with AWS instances, couldn't you do the following:

This way, you have greater control and could avoid conflicting hostnames if all your devices are terraform-managed. I'm not familiar with your infrastructure, but you could go as far to use a random_string resource to generate a unique hostname per device?

rsyring commented 2 years ago

So you could obtain the specific device using the hostname you provide the tailscale up command if the tailscale_device data source is updated to allow you to query on that field.

Isn't that what we do now? The hostname becomes the host part of the device name, which is how tailscale_device works. That all works fine and the process you described above to bootstrap a server is very similar to what I'm doing.

could avoid conflicting hostnames

This is really the key point. Yes, I can do that now. But it means the device's name in Tailscale now has some random value appended to it, which works but is ugly. And, on the off chance something weird happens and you still have a conflicting hostname, there is nothing to indicate this. The failure mode is silent, which I really don't like.

To be clear, this is the problem:

# Host 1
$ tailscale status
...snip...
100.115.118.27  zinc                 randy.syring@ linux   -

Now, let's say I don't realize that name is used and I run a cloud init script based on my Terraform scripts to run the equivalent of:

# Host 2
$ tailscale up --hostname zinc --auth-key ...
Success.
$ tailscale status
100.94.30.82    zinc-1               randy.syring@ linux   -
...snip...
100.115.118.27  zinc                 randy.syring@ linux   -

I have something like this in Terraform:

data "tailscale_device" "this" {
  name = "zinc.example.com"
  wait_for = 90
}

This will pull data from the first zinc host, the IP would be 100.115.118.27 (the previously existing device) not 100.94.300.82 (which we just created with Terraform). There will be no error. The error will come later as I try to use the IP address, in a DNS record for example, and then can't figure out why the server I'm trying to connect to through Tailscale doesn't seem to be the server I want.

This isn't really a problem with this provider, it's just a silent "footgun" and I didn't want to keep this edge case to myself.

davidsbond commented 2 years ago

Thanks for all the detail, without having some random string appended to the hostname I'm not sure any other workarounds are possible. Let's see how the team at Tailscale are regarding the issue you've raised. I'm happy to add any additional support that's within my capacity.

Unfortunately being a terraform provider makes resources etc only as available as the API we call. However, I can imagine that using Terraform/Pulumi will be important for other Tailscale users, so perhaps this use-case will be true for others. I only have very basic statistics that the Terraform registry provides regarding usage, but we have had 7k downloads of the provider at this point, which is perhaps a significant enough number to consider better support.

mindreader commented 5 days ago

Tailscale has been working well for me, but this one issue is a real bummer when it comes to trying to use it at scale.

When I delete a k8s cluster and recreate it without remembering to go into the tailscale UI and manually delete each and every node, everything seems to work but it gets the ips of the old machines and then there are hard to debug connectivity issues later in the process.

Certainly if a new device pops up with a hostname that conflicts with another device with the same hostname that is ephemeral and in an unconnected state, it makes sense to delete the old device on tailscale's side immediately? It was going to be deleted automatically at some point, anyways.

If that makes people nervous an option could be added to tailscale_tailnet_key to perhaps on_ephemeral_hostname_conflict_remake or some such to make such behavior explicit.