hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.92k stars 1.95k forks source link

Parity between `/ui/servers` and `nomad server members` #24309

Open ChefAustin opened 2 weeks ago

ChefAustin commented 2 weeks ago

While I was recently doing an upgrade of our Nomad Servers (in a federated environment) I realized that the /ui/servers view will only display a single server as the 'leader' (which was the leader for the currently selected region) despite the fact that the view does show all servers (across all regions).

Conversely, if one runs nomad server members via CLI, it will also display all servers (across all regions) but it does designated the leaders (for each returned region).

image Note how I've got the table sorted by 'Leader' status so as to show that only 1x Leader server is surfaced in this view. Also important to understand is that the two listed datacenters exist in different regions.

Proposal

IMHO (but others might disagree), since the /ui/servers pane has a table column displaying the region for the listed servers, I think that it would be nice to continue showing all servers across all regions but also to display the leader for each region.

Use-cases

It's nice to be able to see (in the UI) all servers in all regions without switching between regions via drop-down and it would be doubly nice to be able to see all regions with a designation of which servers are leaders (without switching between regions via drop-down menu).

Attempted Solutions

Possible currently; just adds and extra click.

gulducat commented 2 weeks ago

Heya, thanks for the report!

Simplest way I've replicated this is by running 2 dev agents binding to different IPs:

# terminal 1
nomad agent -dev -region=localhost -node=nodeA -bind=127.0.0.1 # the default

# terminal 2
nomad agent -dev -region=wifi -node=nodeB -bind=192.168.1.211 # whatever other IP for your network

# terminal 3
nomad server join 192.168.1.211
nomad server members
nomad ui # http://127.0.0.1:4646/ui/servers

CLI shows both leaders:

$ nomad server members
Name             Address        Port  Status  Leader  Raft Version  Build      Datacenter  Region
nodeA.localhost  127.0.0.1      4648  alive   true    3             1.9.1-dev  dc1         localhost
nodeB.wifi       192.168.1.211  4648  alive   true    3             1.9.1-dev  dc1         wifi

by making these 3 API calls (from agent debug logs):

request complete: method=GET path=/v1/agent/members duration="645.78µs"
request complete: method=GET path=/v1/status/leader?region=localhost duration=2.791637ms
request complete: method=GET path=/v1/status/leader?region=wifi duration="821.174µs"

UI only shows current region's leader:

Screenshot from 2024-10-28 13-09-21

because it only hits /v1/status/leader without including a ?region unless one is selected from the drop-down, it doesn't loop through all regions.

Looks like the extra API calls in CLI are because the list of members is a Serf concern, but leadership is Raft, and only each region knows about its own Raft leadership, so extra calls are made to be forwarded to those regions to ask them directly about their Raft state.

So for the UI to show the info, it'll need to make those extra API calls, too. I'll move this along our queue to consider further and prioritize.

Thanks for the suggestion!