hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.88k stars 1.95k forks source link

Integrate Job Client By Status computation into API #11926

Open ChaiWithJai opened 2 years ago

ChaiWithJai commented 2 years ago

Proposal

The Nomad SysBatch UI created a client-side computation for Job Client By Status. We'd like to see this status become an API endpoint that the client can consume from the server.

The response schema of the read job summary endpoint is perfect for the data visualization pattern we're following.

Background

To determine the status of a job in a particular client will require grouping all the allocations of the job by client ID, and then aggregating their desired and client statuses, deriving a single job status value.

It's possible for a system or sysbatch job to not have allocations placed in some of the clients. This could happen due to constraints or the client not being available at the time the job was registered.

The possible job statuses in a client will be based on the existing allocation summary values, with the addition of two new ones: degraded and not scheduled.

ChaiWithJai commented 2 years ago

@tgross LMK if this needs more description for you to get up-to-speed on how we implemented this.

tgross commented 2 years ago

I've got a couple of questions on this, for my clarity:

As far as the idea of adding this information to the Job Summary API, that seems like a very reasonable spot for it to me as well. We don't have the data in the state store's job_summary table but its trivial to get from that spot in the code by querying allocations for the job, and doesn't require any new ACLs.

One small item that comes to mind is that the data will be degraded during the cluster upgrade that bridges onto this feature. Ex. if I'm hitting the web UI on a server that's a follower that has the UI change but the leader doesn't, the data will temporarily disappear from the UI. I don't think we normally worry about this kind of thing, but I've been running a cluster where the UI is accessible only from one of the clients lately and so forwarding of the API has been on my mind 😀