gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.19k stars 1.73k forks source link

Display nodes that missed heartbeat as offline #3657

Open klizhentas opened 4 years ago

klizhentas commented 4 years ago

Feature Request

Teleport has a graceful period when nodes stopped sending heartbeats but have not expired from the database. In addition to that DynamoDB's one of the most popular backends of teleport has TTL that can be enforced with some lag.

To make UX better, display nodes in UI and tsh as "offline" and consider may be even adding a filter tsh --all showing all nodes, including offline, and tsh ls not showing offline nodes.

Motivation

Describe why we should work on this feature request.

Who's it for?

OSS User, Pro, Enterprise

klizhentas commented 4 years ago

cc @fspmarshall

benarent commented 4 years ago

I spent some time this morning sketching out what a proposed CLI might look like. We want to consider having something similar to --ttl which would let customer more finely tune down the offline window.

tsh ls

This is current behaviour, but with added Status

➜  ~ tsh ls
Node Name             Address         Status  Labels
--------------------- --------------- -------- ------
teleport-4-2-1-node-a 10.2.1.222:3022 online  

tsh ls --all

A new flag that'll show online and offline nodes.

➜  ~ tsh ls --all
Node Name             Address         Status  Labels
--------------------- --------------- -------- ------
teleport-4-2-1-node-a 10.2.1.222:3022 online  os:ox
teleport-4-2-1-node-b 10.2.1.222:3022 online  os:ox
teleport-4-2-1-node-c 10.2.1.222:3022 offline linux:4.14

tsh ls --all

Shows all nodes, but only nodes with a specific label

➜  ~  tsh ls --all os=osx
Node Name             Address         Status  Labels
--------------------- --------------- -------- ------
teleport-4-2-1-node-a 10.2.1.222:3022 online   os:ox
teleport-4-2-1-node-b 10.2.1.223:3022 offline  os:ox

tsh clusters

Current implementation

➜  ~ tsh clusters
Cluster Name      Status
----------------- ------
teleport-421-auth online
graviton-auth     online

tsh clusters

Possible proposlca to keep tsh cluster and tsh ls consistent.

➜  ~ tsh clusters --all
Cluster Name      Status
----------------- ------
teleport-421-auth online
graviton-auth     online
graviton-auth     offline
webvictim commented 4 years ago

Yep, --all or -a is pretty consistent nomenclature here.

klizhentas commented 2 years ago
stereobutter commented 2 years ago

Will it be possible to set the an infinite TTL for nodes, so that a cluster never forgets nodes it has seen (unless an administrator removes them)? Another nice feature would be if offline nodes displayed "office since" or "last seen" both in the UI and tsh

slavitch commented 2 years ago

This. TTL should be infinite with a "delete node" feature.

NathanielMichael commented 2 years ago

Would like to chime in here and +1 while sharing our use case:

We use Teleport to gain access to our IoT nodes which are running in remote/rural environments on satellite/LTE hybrid connections with restricted network configurations. Because we cannot initiate connections directly, the tunneling feature is very handy for us.

Due to the nature of these connections, bandwidth is limited, latency is high, and the connection may just disappear entirely for short periods of time.

Given this use case, it would be really nice if we had the ability to view nodes which have "registered" with Teleport but have missed one (or many) heartbeats.

stereobutter commented 1 year ago

I'd like to chime in with what @NathanielMichael said. We have exactly the same use case in that we use teleport to access our remote IoT devices. We do not use teleport ssh access though as our nodes run talos linux which doesn't use ssh. Instead talos has a (grpc) api for managing the OS that we expose via application access and it runs kubernetes which we also expose via teleport.

We'd really love to use teleport sort of like an inventory management system were we see all our infrastructure, whether teleport agents are connected and if not when they were seen last. Would it be feasible to display heartbeat status also for teleport access other than ssh i.e. kubernetes, applications, DBs, etc?

tmtechnologie commented 1 year ago

Is there any chance to intoruce such funcionality ?

We are also using IoT devices where Internet connection isn't permanent.

simondegheselle commented 1 year ago

Same here, definitely crucial feature for IoT use cases

kmichaliq commented 12 months ago

I look forward to this functionality! I think it will help a lot of people.

arianvp commented 11 months ago

Definitely interested in this. We use auto scaling groups so nodes are cycling all the time. This leads to situations where at least half of the nodes in the Teleport UI are actually not available due to the high heartbeat. Filtering these out is basically a must for usability.

KamilTancula commented 11 months ago

I think this functionality will be useful.

double-em commented 9 months ago

A way to view offline hosts would be much appreciated. This gives the ability to keep an inventory of servers even when they go down.

mhenderson-heroic commented 1 month ago

Just wanted to add that I think this feature will add a ton of value to the product, looking forward to its implementation.

kklis commented 1 week ago

Anything new on the topic? Is the feature still planned? Due to the nature of the project I'm currently working on, there are nodes going online and offline quite frequently, and not being able to see all of them is quite problematic.

webvictim commented 1 week ago

This feature is not currently planned. We do appreciate the +1s and additional context.