hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.76k stars 1.94k forks source link

Nomad Service Catalog should have a filter or metadata for health status #15603

Open mr-karan opened 1 year ago

mr-karan commented 1 year ago

Proposal

Currently, I am querying the list of Nomad Services using consul-template Nomad functions. I couldn't find a way to filter only the list of healthy services.

image image

However, when I query the list of nomad services for this namespace, I can find these services here and it doesn't have any metadata whether the healthcheck failed or not:

[
    {
        "Address": "192.168.29.76",
        "AllocID": "4565fd09-ae36-06c7-229a-bd685ea5b8f4",
        "CreateIndex": 2510,
        "Datacenter": "dc1",
        "ID": "_nomad-task-4565fd09-ae36-06c7-229a-bd685ea5b8f4-web-doggo-web-http",
        "JobID": "doggo",
        "ModifyIndex": 2510,
        "Namespace": "default",
        "NodeID": "987413db-db5f-0267-caff-cc82086234f0",
        "Port": 27159,
        "ServiceName": "doggo-web",
        "Tags": [
            "doggo",
            "web"
        ]
    },
    {
        "Address": "192.168.29.76",
        "AllocID": "52efad7f-6b99-de8f-50b3-9e4358ea7c3c",
        "CreateIndex": 2477,
        "Datacenter": "dc1",
        "ID": "_nomad-task-52efad7f-6b99-de8f-50b3-9e4358ea7c3c-web-doggo-web-http",
        "JobID": "doggo",
        "ModifyIndex": 2477,
        "Namespace": "default",
        "NodeID": "987413db-db5f-0267-caff-cc82086234f0",
        "Port": 28421,
        "ServiceName": "doggo-web",
        "Tags": [
            "doggo",
            "web"
        ]
    },
    {
        "Address": "192.168.29.76",
        "AllocID": "fed569fb-3dac-8ed5-2c0c-7216c0a00ae9",
        "CreateIndex": 2524,
        "Datacenter": "dc1",
        "ID": "_nomad-task-fed569fb-3dac-8ed5-2c0c-7216c0a00ae9-web-doggo-web-http",
        "JobID": "doggo",
        "ModifyIndex": 2524,
        "Namespace": "default",
        "NodeID": "987413db-db5f-0267-caff-cc82086234f0",
        "Port": 24021,
        "ServiceName": "doggo-web",
        "Tags": [
            "doggo",
            "web"
        ]
    }
]

Use-cases

When using NGINX or similar proxy, it's useful to only filter the upstream servers which are healthy. This will help these proxies to not send the request to the upstream servers where health checks are failing.

Attempted Solutions

Couldn't find a workaround

jrasell commented 1 year ago

Hi @mr-karan and thanks for raising, what I agree, would be a useful addition to the Nomad service discovery feature. To provide context, this is a known and designed current limitation as the health check state is only stored on the Nomad client that is performing the checking.

devminded commented 1 year ago

Related to https://github.com/prometheus/prometheus/issues/11775

konnextv commented 4 months ago

Nomad newbie here, this peculiarity should definitely be mentioned in the template docs. I thought it is obvious that only healthy service instances are included in nomadService and wasted hours searching the cause of my problems.