Closed rickardrosen closed 7 years ago
@rickardrosen Can you post the job so we can play with it? Hard to tell from the plan diff.
Sure. Here is an example of a job:
{
"job":{
"Region": "nordic",
"ID": "api:prod",
"ParentID": "",
"Name": "api",
"Type": "service",
"Priority": 20,
"AllAtOnce": false,
"Datacenters": [
"dc-1"
],
"Constraints": [
{
"LTarget": "${node.class}",
"RTarget": "prod",
"Operand": "="
}
],
"TaskGroups": [
{
"Name": "group-1",
"Count": 2,
"Constraints": [],
"RestartPolicy": {
"Attempts": 10,
"Interval": 300000000000,
"Delay": 25000000000,
"Mode": "delay"
},
"Tasks": [
{
"Name": "api",
"Driver": "docker",
"User": "",
"Config": {
"dns_servers": [
"10.70.17.250",
"10.70.17.251"
],
"args": [
"-config",
"file:///local/containerpilot.json",
"npm",
"start"
],
"command": "/local/containerpilot",
"image": "ops-docker.blabla.com/configapi:latest",
"port_map": [],
"dns_search_domains": [
"production.blbla.com"
],
"network_mode": "containernet",
"logging": [
{
"type": "gelf",
"config": [
{
"gelf-address": "udp://logging.blabla.production.com:12203",
"labels": "owner"
}
]
}
],
"labels": [
{
"owner": "ops"
}
]
},
"Env": {
"NODE_ENV": "production"
},
"Services": [],
"Vault": null,
"Templates": [],
"Constraints": [],
"Resources": {
"CPU": 500,
"MemoryMB": 256,
"DiskMB": 0,
"IOPS": 0,
"Networks": [
{
"Device": "",
"CIDR": "",
"IP": "",
"MBits": 10,
"ReservedPorts": null,
"DynamicPorts": null
}
]
},
"DispatchPayload": null,
"Meta": {
"uuid": "7a8e5f3e-a5fb-496d-af7b-e8355a29edab"
},
"KillTimeout": 15000000000,
"LogConfig": {
"MaxFiles": 5,
"MaxFileSizeMB": 10
},
"Artifacts": [],
"Leader": false
}
],
"EphemeralDisk": {
"Sticky": false,
"SizeMB": 300,
"Migrate": false
},
"Meta": null
}
],
"Update": {
"Stagger": 30000000000,
"MaxParallel": 0
},
"Periodic": null,
"ParameterizedJob": null,
"Payload": null,
"Meta": null,
"VaultToken": "",
"Status": "running",
"StatusDescription": "",
"CreateIndex": 238960,
"ModifyIndex": 570008,
"JobModifyIndex": 570008
}
}
@rickardrosen I slightly changed the job to make it run and couldn't reproduce.
This is the job I used and when I do a plan or run I get an in-place update.
{
"job":{
"Region": "global",
"ID": "api:prod",
"ParentID": "",
"Name": "api",
"Type": "service",
"Priority": 20,
"AllAtOnce": false,
"Datacenters": [
"dc1"
],
"TaskGroups": [
{
"Name": "group-1",
"Count": 2,
"Constraints": [],
"RestartPolicy": {
"Attempts": 10,
"Interval": 300000000000,
"Delay": 25000000000,
"Mode": "delay"
},
"Tasks": [
{
"Name": "api",
"Driver": "docker",
"User": "",
"Config": {
"dns_servers": [
"10.70.17.250",
"10.70.17.251"
],
"args": [
"1000"
],
"command": "sleep",
"image": "redis:latest",
"port_map": [],
"dns_search_domains": [
"production.blbla.com"
],
"network_mode": "host",
"logging": [],
"labels": [
{
"owner": "ops"
}
]
},
"Env": {
"NODE_ENV": "production"
},
"Services": [],
"Vault": null,
"Templates": [],
"Constraints": [],
"Resources": {
"CPU": 500,
"MemoryMB": 256,
"DiskMB": 0,
"IOPS": 0,
"Networks": [
{
"Device": "",
"CIDR": "",
"IP": "",
"MBits": 10,
"ReservedPorts": null,
"DynamicPorts": null
}
]
},
"DispatchPayload": null,
"Meta": {
"uuid": "7a8e5f3e-a5fb-496d-af7b-e8355a29edab"
},
"KillTimeout": 15000000000,
"LogConfig": {
"MaxFiles": 5,
"MaxFileSizeMB": 10
},
"Artifacts": [],
"Leader": false
}
],
"EphemeralDisk": {
"Sticky": false,
"SizeMB": 300,
"Migrate": false
},
"Meta": null
}
],
"Update": {
"Stagger": 30000000000,
"MaxParallel": 0
},
"Periodic": null,
"ParameterizedJob": null,
"Payload": null,
"Meta": null,
"VaultToken": "",
"Status": "running",
"StatusDescription": "",
"CreateIndex": 238960,
"ModifyIndex": 570008,
"JobModifyIndex": 570008
}
}
Plan:
{
"Annotations": {
"DesiredTGUpdates": {
"group-1": {
"Ignore": 0,
"Place": 0,
"Migrate": 0,
"Stop": 0,
"InPlaceUpdate": 2,
"DestructiveUpdate": 0
}
}
},
"FailedTGAllocs": null,
"JobModifyIndex": 18,
"CreatedEvals": null,
"Diff": {
"Fields": null,
"ID": "api:prod",
"Objects": null,
"TaskGroups": [
{
"Fields": null,
"Name": "group-1",
"Objects": null,
"Tasks": [
{
"Annotations": [
"forces in-place update"
],
"Fields": null,
"Name": "api",
"Objects": null,
"Type": "Edited"
}
],
"Type": "Edited",
"Updates": {
"in-place update": 2
}
}
],
"Type": "Edited"
},
"NextPeriodicLaunch": "0001-01-01T00:00:00Z",
"Index": 18
}
The things I changed were the:
I don't think any of those would cause the destructive update but why don't you give it a try. Do you want to try modifying your job and seeing what of those is causing it to become destructive?
I have tried changing around a bit of everything, still the update is always destructive.
I can see in your diff that you get an Annotation "forces in-place update". Mine are null, but still results in destructive update.
Shouldn't the diff tell me what's causing the teardown?
@rickardrosen Yeah I am not sure how that is happening. So if you run the job is it destructive or in-place? Do allocations exist? Are they running or terminal?
Can I use run on a json job somehow? Or is there an easy way of converting to HCL? I'd like to see if this is an issue with my job and the HTTP API...
Jobs are autogenerated (for consistency, to avoid destructive changes by accident amongst other things :) ), which is why they are JSON.
Allocations exists and are running.
So If I'm doing this:
This can't be the intended outcome?
@rickardrosen The run cli command is more or less just parsing hcl -> json and using the same endpoint so there is no magic there. Sounds like you have a good setup 👍
No it is not the intended outcome! Is it possible for you to run the job against a nomad agent -dev
mode cluster and try the plan. I wonder if it is something from upgrading a cluster.
Looks like when I stopped the job and re-scheduled, updates are no longer always destructive. At least for the job I've been playing with. No changes to the job, but the plan result differs. Really weird.
Could be a nomad update issue, but I need to test some more to see if I can find some repro steps.
@rickardrosen Okay lets open this issue up again when there are repro steps!
I think I just ran into this issue with 0.5.6, any job updates that change meta are always destructive for all allocs even though nothing else is changed.
Experienced this as well on 0.8.4; caused by this issue
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
0.5.5
When I'm posting to /job it always evaluates to a destructive update.
Even if I post a job, GET it through the http api and POST the same unmodified job, it will end up being destructive.
Below is an example of a diff from such an operation.
What would be the reason for this action to be destructive?