Closed suikast42 closed 8 months ago
Hi @suikast42 π
Which version of Nomad are you running? I just tested on Nomad 1.7.3 and I do get the expected results on port collision:
This is strange:
nomad --version Nomad v1.7.3 BuildDate 2024-01-15T16:55:40Z Revision 60ee328f97d19d2d2d9761251b895b06d82eb1a1
Ok I try it with a simple deployment
job "whoami" {
group "whoami" {
count = 1
network {
mode = "bridge"
port "web" {
to=8080
static = 8080
}
}
service {
name = "${NOMAD_NAMESPACE}-${NOMAD_GROUP_NAME}"
port = "web"
tags = [
"traefik.enable=true",
"traefik.http.routers.${NOMAD_GROUP_NAME}-${NOMAD_ALLOC_ID}.rule=Host(`${NOMAD_NAMESPACE}.${NOMAD_GROUP_NAME}.cloud.private`)",
"traefik.http.routers.${NOMAD_GROUP_NAME}-${NOMAD_ALLOC_ID}.tls=true",
]
check {
type = "http"
path = "/health"
port = "web"
interval = "10s"
timeout = "2s"
}
}
task "whoami" {
driver = "docker"
# driver = "containerd-driver"
config {
image = "traefik/whoami"
ports = ["web"]
args = ["--port", "${NOMAD_PORT_web}"]
}
resources {
cpu = 100
memory = 128
}
}
}
}
The second time I deloy the same job with the name whoami2 and let the rest of the definition the same
The result
nomad job status whoami2
ID = whoami2
Name = whoami2
Submit Date = 2024-02-08T09:28:05Z
Type = service
Priority = 50
Datacenters = *
Namespace = default
Node Pool = default
Status = pending
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost Unknown
whoami 1 0 0 0 0 0 0
Placement Failure
Task Group "whoami":
Latest Deployment
ID = 70630165
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
whoami 1 0 0 0 N/A
Allocations
No allocations placed
nomad eval list
ID Priority Triggered By Job ID Namespace Node ID Status Placement Failures
96fd4b1b 50 queued-allocs whoami2 default <none> blocked N/A - In Progress
ca193e9c 50 job-register whoami2 default <none> complete true
nomad eval status 96fd4b1b
ID = 96fd4b1b
Create Time = 4m45s ago
Modify Time = 4m45s ago
Status = blocked
Status Description = created to place remaining allocations
Type = service
TriggeredBy = queued-allocs
Job ID = whoami2
Namespace = default
Priority = 50
Placement Failures = N/A - In Progress
Failed Placements
Task Group "whoami" (failed to place 1 allocation):
nomad eval status ca193e9c
ID = ca193e9c
Create Time = 5m59s ago
Modify Time = 5m59s ago
Status = complete
Status Description = complete
Type = service
TriggeredBy = job-register
Job ID = whoami2
Namespace = default
Priority = 50
Placement Failures = true
Failed Placements
Task Group "whoami" (failed to place 1 allocation):
Evaluation "96fd4b1b" waiting for additional capacity to place remainder
Hum...sorry I still can't reproduce the problem π€
How many clients do you have? Could you share the full output of when you run nomad job run
for the second time?
Have one worker an one master
2024-02-09 12:11:08.264
[nomad.service π» master-01] [π] [] nomad.job.service_sched.binpack: preemption not possible : eval_id=25b134d9-d664-2128-a0d9-6bf682a66539 job_id=whoami2 namespace=default network_resource="&{bridge 0 <nil> [{web 42000 42000 default}] []}"
2024-02-09 12:11:08.264
[nomad.service π» master-01] [π] [] nomad.job.service_sched: failed to place all allocations, blocked eval created: eval_id=25b134d9-d664-2128-a0d9-6bf682a66539 job_id=whoami2 namespace=default blocked_eval_id=fb84782d-35c5-d3e6-2fb3-85404e7c1a98
2024-02-09 12:11:08.264
[nomad.service π» master-01] [π] [] nomad.job.service_sched: reconciled current state with desired state: eval_id=25b134d9-d664-2128-a0d9-6bf682a66539 job_id=whoami2 namespace=default
2024-02-09 12:11:08.264
[nomad.service π» master-01] [π] [] nomad.job.service_sched: setting eval status: eval_id=25b134d9-d664-2128-a0d9-6bf682a66539 job_id=whoami2 namespace=default status=complete
2024-02-09 12:11:08.264
[nomad.service π» master-01] [β
] [] | Desired Changes for "whoami2": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)
2024-02-09 12:11:08.265
[nomad.service π» master-01] [π] [] http: request complete: method=POST path=/v1/job/whoami2/plan duration=2.186051ms
2024-02-09 12:11:12.063
[nomad.service π» master-01] [π] [] worker.service_sched.binpack: preemption not possible : eval_id=c4362777-c47e-a814-3dd3-9031a69144d8 job_id=whoami2 namespace=default worker_id=3ba79f80-9ba2-cdfe-ba09-9ce8fc0955e1 network_resource="&{bridge 0 <nil> [{web 42000 42000 default}] []}"
2024-02-09 12:11:12.063
[nomad.service π» master-01] [π] [] worker.service_sched: reconciled current state with desired state: eval_id=c4362777-c47e-a814-3dd3-9031a69144d8 job_id=whoami2 namespace=default worker_id=3ba79f80-9ba2-cdfe-ba09-9ce8fc0955e1
2024-02-09 12:11:12.063
[nomad.service π» master-01] [π] [] worker: dequeued evaluation: worker_id=3ba79f80-9ba2-cdfe-ba09-9ce8fc0955e1 eval_id=c4362777-c47e-a814-3dd3-9031a69144d8 type=service namespace=default job_id=whoami2 node_id="" triggered_by=job-register
2024-02-09 12:11:12.064
[nomad.service π» master-01] [β
] [] | Desired Changes for "whoami2": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)
2024-02-09 12:11:12.068
[nomad.service π» master-01] [π] [] worker.service_sched: failed to place all allocations, blocked eval created: eval_id=c4362777-c47e-a814-3dd3-9031a69144d8 job_id=whoami2 namespace=default worker_id=3ba79f80-9ba2-cdfe-ba09-9ce8fc0955e1 blocked_eval_id=84b2b984-606f-5e1b-7ac9-0cfc0a88debe
2024-02-09 12:11:12.068
[nomad.service π» master-01] [π] [] worker: created evaluation: worker_id=3ba79f80-9ba2-cdfe-ba09-9ce8fc0955e1 eval="<Eval \"84b2b984-606f-5e1b-7ac9-0cfc0a88debe\" JobID: \"whoami2\" Namespace: \"default\">" waitUntil="\"0001-01-01 00:00:00 +0000 UTC\""
2024-02-09 12:11:12.073
[nomad.service π» master-01] [π] [] worker.service_sched: setting eval status: eval_id=c4362777-c47e-a814-3dd3-9031a69144d8 job_id=whoami2 namespace=default worker_id=3ba79f80-9ba2-cdfe-ba09-9ce8fc0955e1 status=complete
2024-02-09 12:11:12.078
[nomad.service π» master-01] [π] [] worker: ack evaluation: worker_id=3ba79f80-9ba2-cdfe-ba09-9ce8fc0955e1 eval_id=c4362777-c47e-a814-3dd3-9031a69144d8 type=service namespace=default job_id=whoami2 node_id="" triggered_by=job-register
2024-02-09 12:11:12.078
[nomad.service π» master-01] [π] [] worker: updated evaluation: worker_id=3ba79f80-9ba2-cdfe-ba09-9ce8fc0955e1 eval="<Eval \"c4362777-c47e-a814-3dd3-9031a69144d8\" JobID: \"whoami2\" Namespace: \"default\">"
2024-02-09 12:11:12.090
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2 duration="475.382Β΅s"
2024-02-09 12:11:12.106
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2/allocations duration="289.615Β΅s"
2024-02-09 12:11:12.110
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2/evaluations duration="300.334Β΅s"
2024-02-09 12:11:12.187
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2/deployment?index=1 duration="315.753Β΅s"
2024-02-09 12:11:12.188
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2/summary?index=1 duration="337.219Β΅s"
2024-02-09 12:11:12.190
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2/deployment duration="420.686Β΅s"
2024-02-09 12:11:12.190
[nomad.service π» master-01] [π] [] http: request complete: method=GET path="/v1/vars?prefix=nomad%2Fjobs%2Fwhoami2" duration="310.217Β΅s"
2024-02-09 12:11:12.197
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2/deployment duration="332.207Β΅s"
2024-02-09 12:11:12.207
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2 duration="364.175Β΅s"
2024-02-09 12:11:14.125
[nomad.service π» master-01] [π] [] http: request complete: method=GET path=/v1/job/whoami2/deployment?index=58495 duration="296.495Β΅s"
I try it with bridge and host network mode. Both the same.
My nomad and consul configs. Maybe that helps?
Consul server
datacenter = "nomadder1"
data_dir = "/opt/services/core/consul/data"
log_level = "INFO"
node_name = "master-01"
server = true
bind_addr = "0.0.0.0"
advertise_addr = "172.42.1.10"
client_addr = "0.0.0.0"
encrypt = "G1CHAD7wwu0tU28BlKkirSahTJ/Tqpo9ClOAycQAUwE="
server_rejoin_age_max = "8640h"
# https://developer.hashicorp.com/consul/docs/connect/observability/ui-visualization
ui_config{
enabled = true
dashboard_url_templates {
service = "https://grafana.cloud.private/d/lDlaj-NGz/service-overview?orgId=1&var-service={{Service.Name}}&var-namespace={{Service.Namespace}}&var-partition={{Service.Partition}}&var-dc={{Datacenter}}"
}
metrics_provider = "prometheus"
metrics_proxy {
base_url = "http://mimir.service.consul:9009/prometheus"
add_headers = [
# {
# name = "Authorization"
# value = "Bearer <token>"
# }
{
name = "X-Scope-OrgID"
value = "1"
}
]
path_allowlist = ["/prometheus/api/v1/query_range", "/prometheus/api/v1/query"]
}
}
addresses {
# grpc = "127.0.0.1"
grpc_tls = "127.0.0.1"
}
ports {
http = -1
https = 8501
# grpc = 8502
grpc_tls = 8503
}
connect {
enabled = true
}
retry_join = ["172.42.1.10"]
bootstrap_expect = 1
auto_encrypt{
allow_tls = true
}
performance{
raft_multiplier = 1
}
node_meta{
node_type = "server"
}
tls{
defaults {
ca_file = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
cert_file = "/etc/opt/certs/consul/consul.pem"
key_file = "/etc/opt/certs/consul/consul-key.pem"
verify_incoming = true
verify_outgoing = true
}
internal_rpc {
verify_server_hostname = true
}
}
#watches = [
# {
# type = "checks"
# handler = "/usr/bin/health-check-handler.sh"
# }
#]
telemetry {
disable_hostname = true
prometheus_retention_time = "72h"
}
nomad server
log_level = "DEBUG"
name = "master-01"
datacenter = "nomadder1"
data_dir = "/opt/services/core/nomad/data"
#You should only set this value to true on server agents
#if the terminated server will never join the cluster again
#leave_on_interrupt= false
#You should only set this value to true on server agents
#if the terminated server will never join the cluster again
#leave_on_terminate = false
server {
enabled = true
job_max_priority = 100 # 100 is the default
job_default_priority = 50 # 50 is the default
bootstrap_expect = 1
encrypt = "4PRfoE6Mj9dHTLpnzmYD1+THdlyAo2Ji4U6ewMumpAw="
rejoin_after_leave = true
server_join {
retry_join = ["172.42.1.10"]
retry_max = 0
retry_interval = "15s"
}
}
bind_addr = "0.0.0.0" # the default
advertise {
# Defaults to the first private IP address.
http = "172.42.1.10"
rpc = "172.42.1.10"
serf = "172.42.1.10"
}
tls {
http = true
rpc = true
ca_file = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
cert_file = "/etc/opt/certs/nomad/nomad.pem"
key_file = "/etc/opt/certs/nomad/nomad-key.pem"
verify_server_hostname = true
verify_https_client = true
}
ui {
enabled = true
label {
text = "ππ FenerbaΓ§he 1907 ππ"
background_color = "#163962"
text_color = "##ffed00"
}
consul {
ui_url = "https://consul.cloud.private"
}
vault {
ui_url = "https://vault.cloud.private"
}
}
consul{
ssl= true
address = "127.0.0.1:8501"
grpc_address = "127.0.0.1:8503"
# this works only with ACL enabled
allow_unauthenticated= true
ca_file = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
grpc_ca_file = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
cert_file = "/etc/opt/certs/consul/consul.pem"
key_file = "/etc/opt/certs/consul/consul-key.pem"
}
telemetry {
collection_interval = "1s"
disable_hostname = true
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
consul agent
datacenter = "nomadder1"
data_dir = "/opt/services/core/consul/data"
log_level = "INFO"
node_name = "worker-01"
bind_addr = "0.0.0.0"
advertise_addr = "172.42.1.20"
client_addr = "0.0.0.0"
encrypt = "G1CHAD7wwu0tU28BlKkirSahTJ/Tqpo9ClOAycQAUwE="
addresses {
# grpc = "127.0.0.1"
grpc_tls = "127.0.0.1"
}
ports {
http = -1
https = 8501
# grpc = 8502
grpc_tls = 8503
}
connect {
enabled = true
}
retry_join = ["172.42.1.10"]
auto_encrypt{
tls = true
}
performance{
raft_multiplier = 1
}
node_meta{
node_type = "worker"
}
tls{
defaults {
ca_file = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
cert_file = "/etc/opt/certs/consul/consul.pem"
key_file = "/etc/opt/certs/consul/consul-key.pem"
verify_incoming = false
verify_outgoing = true
}
internal_rpc {
verify_server_hostname = true
}
}
#watches = [
# {
# type = "checks"
# handler = "/usr/bin/health-check-handler.sh"
# }
#]
telemetry {
disable_hostname = true
nomad agent
log_level = "DEBUG"
name = "worker-01"
datacenter = "nomadder1"
data_dir = "/opt/services/core/nomad/data"
bind_addr = "0.0.0.0" # the default
leave_on_interrupt= true
#https://github.com/hashicorp/nomad/issues/17093
#systemctl kill -s SIGTERM nomad will suppress node drain if
#leave_on_terminate set to false
leave_on_terminate = true
advertise {
# Defaults to the first private IP address.
http = "172.42.1.20"
rpc = "172.42.1.20"
serf = "172.42.1.20"
}
client {
enabled = true
network_interface = "eth1"
meta {
node_type= "worker"
connect.log_level = "debug"
connect.sidecar_image= "registry.cloud.private/envoyproxy/envoy:v1.29.0"
}
server_join {
retry_join = ["172.42.1.10"]
retry_max = 0
retry_interval = "15s"
}
# Either leave_on_interrupt or leave_on_terminate must be set
# for this to take effect.
drain_on_shutdown {
deadline = "2m"
force = false
ignore_system_jobs = false
}
host_volume "ca_cert" {
path = "/usr/local/share/ca-certificates/cloudlocal"
read_only = true
}
host_volume "cert_ingress" {
path = "/etc/opt/certs/ingress"
read_only = true
}
## Cert consul client
## Needed for consul_sd_configs
## Should be deleted after resolve https://github.com/suikast42/nomadder/issues/100
host_volume "cert_consul" {
path = "/etc/opt/certs/consul"
read_only = true
}
## Cert consul client
## Needed for jenkins
## Should be deleted after resolve https://github.com/suikast42/nomadder/issues/100
host_volume "cert_nomad" {
path = "/etc/opt/certs/nomad"
read_only = true
}
## Cert docker client
## Needed for jenkins
## Should be deleted after migrating to vault
host_volume "cert_docker" {
path = "/etc/opt/certs/docker"
read_only = true
}
host_network "public" {
interface = "eth0"
#cidr = "203.0.113.0/24"
#reserved_ports = "22,80"
}
host_network "default" {
interface = "eth1"
}
host_network "private" {
interface = "eth1"
}
host_network "local" {
interface = "lo"
}
reserved {
# cpu (int: 0) - Specifies the amount of CPU to reserve, in MHz.
# cores (int: 0) - Specifies the number of CPU cores to reserve.
# memory (int: 0) - Specifies the amount of memory to reserve, in MB.
# disk (int: 0) - Specifies the amount of disk to reserve, in MB.
# reserved_ports (string: "") - Specifies a comma-separated list of ports to reserve on all fingerprinted network devices. Ranges can be specified by using a hyphen separating the two inclusive ends. See also host_network for reserving ports on specific host networks.
cpu = 1000
memory = 2048
}
max_kill_timeout = "1m"
}
tls {
http = true
rpc = true
ca_file = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
cert_file = "/etc/opt/certs/nomad/nomad.pem"
key_file = "/etc/opt/certs/nomad/nomad-key.pem"
verify_server_hostname = true
verify_https_client = true
}
consul{
ssl= true
address = "127.0.0.1:8501"
grpc_address = "127.0.0.1:8503"
# this works only with ACL enabled
allow_unauthenticated= true
ca_file = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
grpc_ca_file = "/usr/local/share/ca-certificates/cloudlocal/cluster-ca-bundle.pem"
cert_file = "/etc/opt/certs/consul/consul.pem"
key_file = "/etc/opt/certs/consul/consul-key.pem"
}
telemetry {
collection_interval = "1s"
disable_hostname = true
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
plugin "docker" {
config {
allow_privileged = false
disable_log_collection = false
# volumes {
# enabled = true
# selinuxlabel = "z"
# }
infra_image = "registry.cloud.private/google_containers/pause-amd64:3.2"
infra_image_pull_timeout ="30m"
extra_labels = ["job_name", "job_id", "task_group_name", "task_name", "namespace", "node_name", "node_id"]
logging {
type = "journald"
config {
labels-regex =".*"
}
}
gc{
container = true
dangling_containers{
enabled = true
# period = "3m"
# creation_grace = "5m"
}
}
}
}
Thank you for the extra information @suikast42!
The server logs allowed me to find the problem. I believe you have service job preemption enabled, which triggered a different code path from the default configuration I was using. I opened #19933 to fix this issue.
To confirm that this is the case, could you share the output of the command nomad operator scheduler get-config
?
Interesting π
Here is the output. By the way I updated to 1.7.4 But nothing changed of course π
Scheduler Algorithm = spread
Memory Oversubscription = true
Reject Job Registration = false
Pause Eval Broker = false
Preemption System Scheduler = true
Preemption Service Scheduler = true
Preemption Batch Scheduler = true
Preemption SysBatch Scheduler = true
Modify Index = 30913
Thanks! Yeah, Preemption Service Scheduler = true
would trigger this. The fix will be available in the next Nomad release.
Thank you again for the report!
I can confirm
After setting nomad operator scheduler set-config -preempt-service-scheduler false
I see the detail ;-)
Thanks! Yeah,
Preemption Service Scheduler = true
would trigger this. The fix will be available in the next Nomad release.Thank you again for the report!
Yes I do this beacuase I activate MemoryOversubscription. Thus I thought that's more dynmamic for my usecase π
Oh yes, preemption is a very nice feature. But it triggers some different code paths that sometimes are not kept up-to-date π¬
But I'm glad we were able to get to the bottom of this. I was really confused why it wasn't happening to me π
I had a similar issue at the past but donΓT undersnatd why my evalatation is blocked.
See https://github.com/hashicorp/nomad/issues/19446
Now I can reproduce the issue.
I have deployed a mssql DB with a static port mapping. Then I try accidently deloy a second job with the same static port mapping with only one worker node.
That's not a bug that nomad deny the allocation. But the information why the alocation is blocked is nowhere listed.
nomad job status
deployment status 292a527f
An information like 'not enough cpu, mem' or 'port conflict and no more nodes avlialabe' cloud be very handy for trouble shooting