Open sloan-dog opened 5 years ago
@sloan-dog That error is returned when the deployment status is not Running
or Paused
.
Could you try logging the data around the status of the deployment so we can get a clearer understanding of what is happening? - Thanks!
I'll try and capture that, thanks!
@endocrimes Back from the great beyond...
Now using nomad 0.9.3ish I use a bash script to update jobs. Essentially, we take the updated job hcl (changes range from a new container image, to an increase or decrease in desired allocations), convert it to json, and submit it via HTTP API.
We extract the eval id from the update job response, and poll the eval status endpoint until the delployment id is available. Then, we poll the deployment status endpoint until all allocs are healthy or any alloc is unhealthy as determined by health checks. Upon exiting that poll with healthy allocs we call the promote endpoint with deployment id.
Based on the output below, (the changes tested here were simply alloc count), it appears either a new deployment is not actually being created or the existing deployment is being mutated. I do not know
Apologies for not pretty printing this output
Job spec json:
{ "Job": { "Stop": null, "Region": null, "Namespace": null, "ID": "unstable-migration-runner", "ParentID": null, "Name": "unstable-migration-runner", "Type": "service", "Priority": null, "AllAtOnce": null, "Datacenters": [ "dc1" ], "Constraints": null, "Affinities": null, "TaskGroups": [ { "Name": "unstable-migration-runner", "Count": 2, "Constraints": null, "Affinities": null, "Tasks": [ { "Name": "unstable-migration-runner", "Driver": "docker", "User": "", "Config": { "args": [ "***REDACTED***" ], "command": "***REDACTED***", "image": "***REDACTED***" }, "Constraints": null, "Affinities": null, "Env": { ***REDACTED*** }, "Services": [ { "Id": "", "Name": "unstable-migration-runner", "Tags": [ "***REDACTED***" ], "CanaryTags": null, "PortLabel": "http", "AddressMode": "", "Checks": [ { "Id": "", "Name": "prod-migration-runner-alive", "Type": "script", "Command": "**REDACTED***", "Args": null, "Path": "", "Protocol": "", "PortLabel": "", "AddressMode": "", "Interval": 1000000000, "Timeout": 10000000000, "InitialStatus": "", "TLSSkipVerify": false, "Header": null, "Method": "", "CheckRestart": null, "GRPCService": "", "GRPCUseTLS": false } ], "CheckRestart": null } ], "Resources": { "CPU": 256, "MemoryMB": 252, "DiskMB": null, "Networks": [ { "Device": "", "CIDR": "", "IP": "", "MBits": null, "ReservedPorts": null, "DynamicPorts": [ { "Label": "http", "Value": 0 } ] } ], "Devices": null, "IOPS": null }, "Meta": null, "KillTimeout": null, "LogConfig": null, "Artifacts": null, "Vault": null, "Templates": [ { "SourcePath": null, "DestPath": "***REDACTED***", "EmbeddedTmpl": "***REDACTED***", "ChangeMode": "restart", "ChangeSignal": null, "Splay": 5000000000, "Perms": "0644", "LeftDelim": null, "RightDelim": null, "Envvars": true, "VaultGrace": null } ], "DispatchPayload": null, "Leader": false, "ShutdownDelay": 0, "KillSignal": "" } ], "Spreads": null, "RestartPolicy": { "Interval": null, "Attempts": null, "Delay": null, "Mode": "delay" }, "ReschedulePolicy": null, "EphemeralDisk": null, "Update": { "Stagger": null, "MaxParallel": 1, "HealthCheck": "checks", "MinHealthyTime": 5000000000, "HealthyDeadline": 60000000000, "ProgressDeadline": 120000000000, "Canary": 1, "AutoRevert": true, "AutoPromote": null }, "Migrate": null, "Meta": null } ], "Update": null, "Spreads": null, "Periodic": null, "ParameterizedJob": null, "Dispatched": false, "Payload": null, "Reschedule": null, "Migrate": null, "Meta": null, "VaultToken": null, "Status": null, "StatusDescription": null, "Stable": null, "Version": null, "SubmitTime": null, "CreateIndex": null, "ModifyIndex": null, "JobModifyIndex": null } }
Job update output:
{"EvalID":"3662930c-5231-32a2-6ad5-43553f36049a","EvalCreateIndex":386446,"JobModifyIndex":386445,"Warnings":"","Index":386446,"LastContact":0,"KnownLeader":false}
Eval:
{"ID":"3662930c-5231-32a2-6ad5-43553f36049a","Namespace":"default","Priority":50,"Type":"service","TriggeredBy":"job-register","JobID":"unstable-migration-runner","JobModifyIndex":386445,"DeploymentID":"fcc90ed5-9e95-9d47-cbf8-cca40ba81137","Status":"complete","WaitUntil":"0001-01-01T00:00:00Z","QueuedAllocations":{"unstable-migration-runner":0},"SnapshotIndex":386446,"CreateIndex":386446,"ModifyIndex":386448}
deployment state:
{ "ID": "fcc90ed5-9e95-9d47-cbf8-cca40ba81137", "Namespace": "default", "JobID": "unstable-migration-runner", "JobVersion": 283, "JobModifyIndex": 386445, "JobSpecModifyIndex": 386445, "JobCreateIndex": 111, "TaskGroups": { "unstable-migration-runner": { "AutoRevert": true, "AutoPromote": false, "ProgressDeadline": 120000000000, "RequireProgressBy": "2019-07-30T20:33:48.477419326Z", "Promoted": false, "PlacedCanaries": null, "DesiredCanaries": 0, "DesiredTotal": 2, "PlacedAllocs": 2, "HealthyAllocs": 0, "UnhealthyAllocs": 0 } }, "Status": "running", "StatusDescription": "Deployment is running", "CreateIndex": 386447, "ModifyIndex": 386447 }
...5 seconds later
deployment state:
{ "ID": "fcc90ed5-9e95-9d47-cbf8-cca40ba81137", "Namespace": "default", "JobID": "unstable-migration-runner", "JobVersion": 283, "JobModifyIndex": 386445, "JobSpecModifyIndex": 386445, "JobCreateIndex": 111, "TaskGroups": { "unstable-migration-runner": { "AutoRevert": true, "AutoPromote": false, "ProgressDeadline": 120000000000, "RequireProgressBy": "2019-07-30T20:33:53.993524754Z", "Promoted": false, "PlacedCanaries": null, "DesiredCanaries": 0, "DesiredTotal": 2, "PlacedAllocs": 2, "HealthyAllocs": 1, "UnhealthyAllocs": 0 } }, "Status": "running", "StatusDescription": "Deployment is running", "CreateIndex": 386447, "ModifyIndex": 386451 }
...5 seconds later
deployment state:
{ "ID": "fcc90ed5-9e95-9d47-cbf8-cca40ba81137", "Namespace": "default", "JobID": "unstable-migration-runner", "JobVersion": 283, "JobModifyIndex": 386445, "JobSpecModifyIndex": 386445, "JobCreateIndex": 111, "TaskGroups": { "unstable-migration-runner": { "AutoRevert": true, "AutoPromote": false, "ProgressDeadline": 120000000000, "RequireProgressBy": "2019-07-30T20:33:53.993524754Z", "Promoted": false, "PlacedCanaries": null, "DesiredCanaries": 0, "DesiredTotal": 2, "PlacedAllocs": 2, "HealthyAllocs": 1, "UnhealthyAllocs": 0 } }, "Status": "running", "StatusDescription": "Deployment is running", "CreateIndex": 386447, "ModifyIndex": 386451 }
...5 seconds later
deployment state:
{ "ID": "fcc90ed5-9e95-9d47-cbf8-cca40ba81137", "Namespace": "default", "JobID": "unstable-migration-runner", "JobVersion": 283, "JobModifyIndex": 386445, "JobSpecModifyIndex": 386445, "JobCreateIndex": 111, "TaskGroups": { "unstable-migration-runner": { "AutoRevert": true, "AutoPromote": false, "ProgressDeadline": 120000000000, "RequireProgressBy": "2019-07-30T20:33:53.993524754Z", "Promoted": false, "PlacedCanaries": null, "DesiredCanaries": 0, "DesiredTotal": 2, "PlacedAllocs": 2, "HealthyAllocs": 1, "UnhealthyAllocs": 0 } }, "Status": "running", "StatusDescription": "Deployment is running", "CreateIndex": 386447, "ModifyIndex": 386451 }
...5 seconds later
deployment state:
{ "ID": "fcc90ed5-9e95-9d47-cbf8-cca40ba81137", "Namespace": "default", "JobID": "unstable-migration-runner", "JobVersion": 283, "JobModifyIndex": 386445, "JobSpecModifyIndex": 386445, "JobCreateIndex": 111, "TaskGroups": { "unstable-migration-runner": { "AutoRevert": true, "AutoPromote": false, "ProgressDeadline": 120000000000, "RequireProgressBy": "2019-07-30T20:33:53.993524754Z", "Promoted": false, "PlacedCanaries": null, "DesiredCanaries": 0, "DesiredTotal": 2, "PlacedAllocs": 2, "HealthyAllocs": 1, "UnhealthyAllocs": 0 } }, "Status": "running", "StatusDescription": "Deployment is running", "CreateIndex": 386447, "ModifyIndex": 386451 }
...5 seconds later :)
deployment state:
{ "ID": "fcc90ed5-9e95-9d47-cbf8-cca40ba81137", "Namespace": "default", "JobID": "unstable-migration-runner", "JobVersion": 283, "JobModifyIndex": 386445, "JobSpecModifyIndex": 386445, "JobCreateIndex": 111, "TaskGroups": { "unstable-migration-runner": { "AutoRevert": true, "AutoPromote": false, "ProgressDeadline": 120000000000, "RequireProgressBy": "2019-07-30T20:34:14.493525392Z", "Promoted": false, "PlacedCanaries": null, "DesiredCanaries": 0, "DesiredTotal": 2, "PlacedAllocs": 2, "HealthyAllocs": 2, "UnhealthyAllocs": 0 } }, "Status": "successful", "StatusDescription": "Deployment completed successfully", "CreateIndex": 386447, "ModifyIndex": 386457 }
All allocs healthy. Promoting deployment: fcc90ed5-9e95-9d47-cbf8-cca40ba81137
Promote payload
{ "DeploymentID": "fcc90ed5-9e95-9d47-cbf8-cca40ba81137", "All": true }
Promote result:
rpc error: can't promote terminal deployment
If you have a question, prepend your issue with
[question]
or preferably use the nomad mailing list.If filing a bug please include the following:
Nomad version
Output from
nomad version
0.8.4Operating system and Environment details
Ubuntu 16.04 Running a 5 node cluster on Ec2 t2.medium which each runs vault, nomad, and consul.
Issue
We use canary deployment strategy with matching canaries count to achieve blue green deployment. Our strategy is to poll for the deployment status until all canary allocs are healthy and then promote (we fail if any one becomes unhealthy by timeout). We use the docker driver.
However, we occasinally encounter an error when calling promote
Cannot promote terminal deployment
Which is clear that the deployment is terminal, but I neither know what that means or how a canary deployment can become terminal without my doing. The odd thing is the container is the correct build version. (Build version is passed via env args to container and served in API)
Reproduction steps
TLDR; Interpolate docker image name and build id into job spec convert job ACL -> json Write to jobs endpoint Read deployment id Poll deployment for health Promote deployment
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
Job file (if appropriate)