Feature Request: check status before running tailscale up

tedski commented 5 months ago

tailscale up is run regardless of the status each time, causing changes to be true. This gives misleading output as nothing has actually changed.

tailscale status --json includes a boolean at .Self.Online that would determine if up needed to be run to create the connection. Unfortunately, this does not handle the case where the args are different and up needs to be run to change something.

jason-riddle commented 5 months ago

Thank you for creating this issue.

Yes, this is also something that has bothered me too. Here are my current thoughts on fixing this.

I’m thinking about adding another variable to control this behavior.

tailscale_up_only_if_offline: false

If tailscale_up_only_if_offline is set to true, then if the node is offline, the up command will run. If the node is online, the up command will not be run. By default, tailscale_up_only_if_offline will be set to false, which means it will always run the up command regardless of its offline status.

Question for you, does the above sound reasonable?

If it does sound reasonable, my next question to you is what is the best way for determining if a node is offline? I would like to use the JSON output from tailscale status --json, so I’ve pasted that below

When the node is disconnected, here is the output for tailscale status --json,

{
  "Version": "1.56.1-t906f85d10-g34ed54c8c",
  "TUN": true,
  "BackendState": "NeedsLogin",
  "AuthURL": "",
  "TailscaleIPs": null,
  "Self": {
    "ID": "",
    "PublicKey": "nodekey:0000000...<000-repeated>",
    "HostName": "...",
    "DNSName": "",
    "OS": "linux",
    "UserID": 0,
    "TailscaleIPs": null,
    "Addrs": [],
    "CurAddr": "",
    "Relay": "",
    "RxBytes": 0,
    "TxBytes": 0,
    "Created": "0001-01-01T00:00:00Z",
    "LastWrite": "0001-01-01T00:00:00Z",
    "LastSeen": "0001-01-01T00:00:00Z",
    "LastHandshake": "0001-01-01T00:00:00Z",
    "Online": false,
    "ExitNode": false,
    "ExitNodeOption": false,
    "Active": false,
    "PeerAPIURL": null,
    "InNetworkMap": false,
    "InMagicSock": false,
    "InEngine": false
  },
  "Health": [
    "state=NeedsLogin, wantRunning=false"
  ],
  "MagicDNSSuffix": "",
  "CurrentTailnet": null,
  "CertDomains": null,
  "Peer": null,
  "User": null,
  "ClientVersion": null
}

When the node is connected, this is the output for tailscale status --json,

{
  "Version": "1.56.1-t906f85d10-g34ed54c8c",
  "TUN": true,
  "BackendState": "Running",
  "AuthURL": "",
  "TailscaleIPs": [
    "100.100...",
    "fd7a:..."
  ],
  "Self": {
    "ID": "nAZzPn...",
    "PublicKey": "nodekey:a9195b...",
    "HostName": "...",
    "DNSName": "....ts.net.",
    "OS": "linux",
    "UserID": "<Removed>",
    "TailscaleIPs": [
    "100.100....",
    "fd7a:..."
    ],
    "AllowedIPs": [
    "100..../32",
    "fd7a:.../128"
    ],
    "Addrs": [
    "<Removed>"
    ],
    "CurAddr": "",
    "Relay": "sfo",
    "RxBytes": 0,
    "TxBytes": 0,
    "Created": "2024-02-27T19:49:08.489728854Z",
    "LastWrite": "0001-01-01T00:00:00Z",
    "LastSeen": "0001-01-01T00:00:00Z",
    "LastHandshake": "0001-01-01T00:00:00Z",
    "Online": true,
    "ExitNode": false,
    "ExitNodeOption": false,
    "Active": false,
    "PeerAPIURL": [
    "http://100.100.208.185:62681",
    "http://[fd7a:115c:a1e0::7fd0:5d6e]:33636"
    ],
    "CapMap": {
    "funnel": null,
    "https": null,
    "https://tailscale.com/cap/funnel-ports?ports=443,8443,10000": null,
    "https://tailscale.com/cap/is-admin": null,
    "https://tailscale.com/cap/ssh": null,
    "https://tailscale.com/cap/tailnet-lock": null,
    "mullvad": null,
    "probe-udp-lifetime": null
    },
    "InNetworkMap": true,
    "InMagicSock": false,
    "InEngine": false,
    "KeyExpiry": "2024-08-25T19:49:08Z"
  },
  "Health": [
    "Update available: 1.56.1 -\u003e 1.60.0, run `tailscale update` or `tailscale set --auto-update` to update"
  ],
  "MagicDNSSuffix": "....ts.net",
  "CurrentTailnet": {
    "Name": "...@gmail.com",
    "MagicDNSSuffix": "....ts.net",
    "MagicDNSEnabled": true
  },
  "CertDomains": [
    "....ts.net"
  ],
  "Peer": {
    "<Removed>"
  },
  "User": {
    "547213...": {
    "ID": "547213...",
    "LoginName": "...@gmail.com",
    "DisplayName": "...",
    "ProfilePicURL": "https://lh3.googleusercontent.com/...",
    "Roles": []
    },
    "67872...": {
    "ID": "67872...",
    "LoginName": "tagged-devices",
    "DisplayName": "Tagged Devices",
    "ProfilePicURL": "",
    "Roles": []
    }
  },
  "ClientVersion": {
    "LatestVersion": "1.60.0"
  }

So, to summarize my questions

1) Does adding tailscale_up_only_if_offline sound reasonable? 2) If it sounds reasonable, given the (very long and verbose) JSON output above from when a node is disconnected and when a node is connected, how would you determine if this node is offline?

Also, an additional question

3) How would you determine if a node is just temporarily disconnected and “up” should not run again? For example, it’s not able to reach the Tailscale control server, there is a networking/packet drop issue on the client side, etc.?

tedski commented 5 months ago

Maybe I'm too much of a perfectionist, but my desire would be to make it idempotent without a feature flag such as the var you mention. I think it's a good fallback if we can't find a better way, though. I think the solution is multi-part.

Run tailscale up when it's not already online
Run tailscale up when the options have changed and the connection needs to be updated.

The JSON output from tailscale status --json lets us solve the first case. Here's a quick task list I threw together to show how to do this:

  tasks:
  - name: get tailscale status
    ansible.builtin.command: tailscale status --json
    register: tailscale_status

  - name: run tailscale up
    ansible.builtin.command: tailscale up
    when: ((tailscale_status.stdout | from_json).Self.Online) is not  true

Note: (tailscale_status.stdout | from_json | community.general.json_query('Self.Online')) would also work, however the json_query() Jinja2 filter requires the jmespath package per the documentation and I'm very new to Ansible, so I wasn't sure if that's common to have installed on the control node. I don't have a pristine environment to test on right now.

However, that doesn't solve for the second case. That I have been mulling over and trying to think on. I'll update if I get any ideas.

tedski commented 5 months ago

As soon as I hit submit on that last comment, I had an aha! moment. If we think about this... we are solving for the Ansible task reporting changes when making changes. I think it can be a bit more simple than the route we were going. Rather than building a logic tree of when to run or not run tailscale up, why not change the detection of when changes were made.

How do we define changes in this sense? We know tailscale up is itself idempotent -- i.e. we can run it over and over and it won't actually do anything if not needed. So, let's leave that logic to the Tailscale team.

An approach I'm working up now, and not sure if it's possible in Ansible, is to compare Self in the JSON output before and after tailscale up and use that for the changed_when key in the tailscale up task. However, I'm currently struggling with how to get fresh tailscale status output in the middle of that task. This is likely due to me being new to Ansible. I'll keep prodding around and see if I can solve it.

tedski commented 5 months ago

This certainly isn't the prettiest solution, but it illustrates what I was referring to in my last command. What do you think?

  tasks:
  - name: get tailscale status
    ansible.builtin.command: tailscale status --json
    register: tailscale_old
    changed_when: false

  - name: run tailscale up
    ansible.builtin.shell: |
      tailscale up >&2
      tailscale status --json
    register: tailscale_new
    changed_when: (tailscale_old.stdout | from_json).Self != (tailscale_new.stdout | from_json).Self

jason-riddle commented 5 months ago

That's an interesting approach. I'll need some time to think about that.

jason-riddle commented 5 months ago

Ok, so I've reread this thread and I just want to make sure I understand what you are asking before commenting on your proposed solution.

So in your first comment, you said the following.

tailscale up is run regardless of the status each time, causing changes to be true. This gives misleading output as nothing has actually changed.

tailscale status --json includes a boolean at .Self.Online that would determine if up needed to be run to create the connection. Unfortunately, this does not handle the case where the args are different and up needs to be run to change something.

1) So what I understood from this comment was that you wanted the "up" command to be more idempotent. To rephrase, by looking at the tailscale status --json output and inspecting the boolean at .Self.Online, you could determine whether the up command had to run again or could be skipped, right?

2) And then somewhere in your followup responses, it sounds like you couldn't figure out a way to determine when the args changed in order to determine if up should run or not, so then you changed your approach to A) always run up and B) compare the output of tailscale status --json and inspecting .Self.Online to determine if a change actually happened, right?

So then you created this solution, which I've copied below,

This certainly isn't the prettiest solution, but it illustrates what I was referring to in my last command. What do you think?

  tasks:
  - name: get tailscale status
    ansible.builtin.command: tailscale status --json
    register: tailscale_old
    changed_when: false

  - name: run tailscale up
    ansible.builtin.shell: |
      tailscale up >&2
      tailscale status --json
    register: tailscale_new
    changed_when: (tailscale_old.stdout | from_json).Self != (tailscale_new.stdout | from_json).Self

Which brings me to my last question

3) It looks like in the above code, the up command is always being ran, but you've added changed_when, so it's clear in the ansible playbook output summary when a change actually did occur or did not occur, right?

Am I understanding points 1., 2., and 3. correctly?

tedski commented 5 months ago

Yes, you are understanding correctly.

jason-riddle commented 5 months ago

Cool, thanks for clarifying. Okay, so back to your solution.

tasks:
- name: get tailscale status
  ansible.builtin.command: tailscale status --json
  register: tailscale_old
  changed_when: false

- name: run tailscale up
  ansible.builtin.shell: |
    tailscale up >&2
    tailscale status --json
  register: tailscale_new
  changed_when: (tailscale_old.stdout | from_json).Self != (tailscale_new.stdout | from_json).Self

My only concern with this code is this line

tailscale up >&2

So you are discarding stderr and replacing it with stdout. My concern is that discarding stderr could make debugging harder when up fails to finish successfully.

Here's an example to explain it better.

# Run tailscale up with a bad key and save the stdout and stderr to separate files.
sudo tailscale up --auth-key "tskey-auth-bad-key" >up-bad-key.out 2>up-bad-key.err

# Inspect the size using `ls -al`, notice that `up-bad-key.out` is empty while `up-bad-key.err` contains something.
$ ls -al
total 12
drwxr-xr-x  2 pi   pi   4096 Mar 10 23:57 .
drwxrwxrwt 11 root root 4096 Mar 10 23:53 ..
-rw-r--r--  1 pi   pi     55 Mar 10 23:56 up-bad-key.err
-rw-r--r--  1 pi   pi      0 Mar 10 23:56 up-bad-key.out

# Let's check what is in `up-bad-key.err`.
cat up-bad-key.err
backend error: invalid key: unable to validate API key

So in this example, I run up with a bad key, save the stderr and stdout to separate files, and check which file contains some output. So in this case, stderr contained useful information and stdout didn't have anything.

So let's go back to that previous line of code, but let's modify it slightly,

tailscale up --auth-key "tskey-auth-bad-key" >&2

So this would have discarded stderr with the empty stdout and you'd have no idea what the actual problem was. So I have some follow up questions.

1) Does tailscale up ever output anything "useful" to stdout? I'm being vague with the word useful because it's hard for me to tell because I have not used that command that much.

2) Assuming from 1. that tailscale up does output something "useful" to stdout, then I would ideally like to preserve the contents of stderr. Any thoughts on how to do this?

My thoughts: I'm considering writing stderr to a file, which could work, but could leave lots of small log files therefore annoying the user. Also, if it's written to a file, I'd have to print the name of the file and would have to tell the user "hey, if there was an issue, it's not going to be in the playbook output, but in this log file", which.. doesn't seem great.

tedski commented 5 months ago

A quick clarification first: tailscale up >&2 does not discard anything. It redirects stdout to file descriptor 2 which is stderr. So, what was intended to be sent to stdout by tailscale's authors now goes to stderr along with everything that was originally intended for stderr. Nothing lost.

Why do this? Because anything that is output to stdout would get concatenated with the output of tailscale status --json and therefore make it not parsable as JSON.

To answer your questions:

Does tailscale up ever output anything "useful" to stdout? I'm being vague with the word useful because it's hard for me to tell because I have not used that command that much.

One thing I observed it output is a message letting the user know that an upgrade is available and to either upgrade or enable auto-upgrade. As you hinted, "useful" is subjective, so I'll just say that if it was worth writing to stdout by the authors, it's worth having somewhere. So, I redirected to stderr in order to ensure nothing of "value" is lost.

Assuming from 1. that tailscale up does output something "useful" to stdout, then I would ideally like to preserve the contents of stderr. Any thoughts on how to do this?

Answered in my clarification as not applicable.

Hopefully that clears things up, but if not... happy to discuss further.

jason-riddle commented 5 months ago

Ah, yes you are right. The outputs of stdout and stderr are combined and saved to stderr, thanks for clarifying that.

Ignore my other questions, they are not applicable.

Okay, I like this solution and will work on implementing this. I will comment here when it's ready for use. I expect it to take no longer than Friday to implement and create a release to start using it.

jason-riddle commented 5 months ago

I was able to finish this sooner than expected.

I've created release 1.1.21. Please try it out and let me know if this solves your issue.

tedski commented 5 months ago

I tested with 1.1.21 and it now shows no changes when nothing is changed, but shows changes when up actually does something. Thanks!

jason-riddle / ansible-role-tailscale

Feature Request: check status before running tailscale up #101