hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.9k stars 1.95k forks source link

Update job is always destructive #2464

Closed rickardrosen closed 7 years ago

rickardrosen commented 7 years ago

Nomad version

0.5.5

When I'm posting to /job it always evaluates to a destructive update.

Even if I post a job, GET it through the http api and POST the same unmodified job, it will end up being destructive.

Below is an example of a diff from such an operation.

What would be the reason for this action to be destructive?

{
  "Annotations": {
    "DesiredTGUpdates": {
      "group-1": {
        "Ignore": 0,
        "Place": 0,
        "Migrate": 0,
        "Stop": 0,
        "InPlaceUpdate": 0,
        "DestructiveUpdate": 2
      }
    }
  },
  "FailedTGAllocs": null,
  "JobModifyIndex": 570008,
  "CreatedEvals": null,
  "Diff": {
    "Fields": null,
    "ID": "api:prod",
    "Objects": [
      {
        "Fields": [
          {
            "Annotations": null,
            "Name": "Datacenters",
            "New": "dc-1",
            "Old": "dc-1",
            "Type": "None"
          }
        ],
        "Name": "Datacenters",
        "Objects": null,
        "Type": "None"
      }
    ],
    "TaskGroups": [
      {
        "Fields": null,
        "Name": "group-1",
        "Objects": null,
        "Tasks": [
          {
            "Annotations": null,
            "Fields": null,
            "Name": "api",
            "Objects": null,
            "Type": "None"
          }
        ],
        "Type": "None",
        "Updates": {
          "create/destroy update": 2
        }
      }
    ],
    "Type": "Edited"
  },
  "NextPeriodicLaunch": "0001-01-01T00:00:00Z",
  "Index": 570008
}
dadgar commented 7 years ago

@rickardrosen Can you post the job so we can play with it? Hard to tell from the plan diff.

rickardrosen commented 7 years ago

Sure. Here is an example of a job:

{
"job":{
  "Region": "nordic",
  "ID": "api:prod",
  "ParentID": "",
  "Name": "api",
  "Type": "service",
  "Priority": 20,
  "AllAtOnce": false,
  "Datacenters": [
    "dc-1"
  ],
  "Constraints": [
    {
      "LTarget": "${node.class}",
      "RTarget": "prod",
      "Operand": "="
    }
  ],
  "TaskGroups": [
    {
      "Name": "group-1",
      "Count": 2,
      "Constraints": [],
      "RestartPolicy": {
        "Attempts": 10,
        "Interval": 300000000000,
        "Delay": 25000000000,
        "Mode": "delay"
      },
      "Tasks": [
        {
          "Name": "api",
          "Driver": "docker",
          "User": "",
          "Config": {
            "dns_servers": [
              "10.70.17.250",
              "10.70.17.251"
            ],
            "args": [
              "-config",
              "file:///local/containerpilot.json",
              "npm",
              "start"
            ],
            "command": "/local/containerpilot",
            "image": "ops-docker.blabla.com/configapi:latest",
            "port_map": [],
            "dns_search_domains": [
              "production.blbla.com"
            ],
            "network_mode": "containernet",
            "logging": [
              {
                "type": "gelf",
                "config": [
                  {
                    "gelf-address": "udp://logging.blabla.production.com:12203",
                    "labels": "owner"
                  }
                ]
              }
            ],
            "labels": [
              {
                "owner": "ops"
              }
            ]
          },
          "Env": {
            "NODE_ENV": "production"
          },
          "Services": [],
          "Vault": null,
          "Templates": [],
          "Constraints": [],
          "Resources": {
            "CPU": 500,
            "MemoryMB": 256,
            "DiskMB": 0,
            "IOPS": 0,
            "Networks": [
              {
                "Device": "",
                "CIDR": "",
                "IP": "",
                "MBits": 10,
                "ReservedPorts": null,
                "DynamicPorts": null
              }
            ]
          },
          "DispatchPayload": null,
          "Meta": {
            "uuid": "7a8e5f3e-a5fb-496d-af7b-e8355a29edab"
          },
          "KillTimeout": 15000000000,
          "LogConfig": {
            "MaxFiles": 5,
            "MaxFileSizeMB": 10
          },
          "Artifacts": [],
          "Leader": false
        }
      ],
      "EphemeralDisk": {
        "Sticky": false,
        "SizeMB": 300,
        "Migrate": false
      },
      "Meta": null
    }
  ],
  "Update": {
    "Stagger": 30000000000,
    "MaxParallel": 0
  },
  "Periodic": null,
  "ParameterizedJob": null,
  "Payload": null,
  "Meta": null,
  "VaultToken": "",
  "Status": "running",
  "StatusDescription": "",
  "CreateIndex": 238960,
  "ModifyIndex": 570008,
  "JobModifyIndex": 570008
  }
}
dadgar commented 7 years ago

@rickardrosen I slightly changed the job to make it run and couldn't reproduce.

This is the job I used and when I do a plan or run I get an in-place update.

{
"job":{
 "Region": "global",
 "ID": "api:prod",
 "ParentID": "",
 "Name": "api",
 "Type": "service",
 "Priority": 20,
 "AllAtOnce": false,
 "Datacenters": [
   "dc1"
 ],
 "TaskGroups": [
   {
     "Name": "group-1",
     "Count": 2,
     "Constraints": [],
     "RestartPolicy": {
       "Attempts": 10,
       "Interval": 300000000000,
       "Delay": 25000000000,
       "Mode": "delay"
     },
     "Tasks": [
       {
         "Name": "api",
         "Driver": "docker",
         "User": "",
         "Config": {
           "dns_servers": [
             "10.70.17.250",
             "10.70.17.251"
           ],
           "args": [
             "1000"
           ],
           "command": "sleep",
           "image": "redis:latest",
           "port_map": [],
           "dns_search_domains": [
             "production.blbla.com"
           ],
           "network_mode": "host",
           "logging": [],
           "labels": [
             {
               "owner": "ops"
             }
           ]
         },
         "Env": {
           "NODE_ENV": "production"
         },
         "Services": [],
         "Vault": null,
         "Templates": [],
         "Constraints": [],
         "Resources": {
           "CPU": 500,
           "MemoryMB": 256,
           "DiskMB": 0,
           "IOPS": 0,
           "Networks": [
             {
               "Device": "",
               "CIDR": "",
               "IP": "",
               "MBits": 10,
               "ReservedPorts": null,
               "DynamicPorts": null
             }
           ]
         },
         "DispatchPayload": null,
         "Meta": {
           "uuid": "7a8e5f3e-a5fb-496d-af7b-e8355a29edab"
         },
         "KillTimeout": 15000000000,
         "LogConfig": {
           "MaxFiles": 5,
           "MaxFileSizeMB": 10
         },
         "Artifacts": [],
         "Leader": false
       }
     ],
     "EphemeralDisk": {
       "Sticky": false,
       "SizeMB": 300,
       "Migrate": false
     },
     "Meta": null
   }
 ],
 "Update": {
   "Stagger": 30000000000,
   "MaxParallel": 0
 },
 "Periodic": null,
 "ParameterizedJob": null,
 "Payload": null,
 "Meta": null,
 "VaultToken": "",
 "Status": "running",
 "StatusDescription": "",
 "CreateIndex": 238960,
 "ModifyIndex": 570008,
 "JobModifyIndex": 570008
 }
}

Plan:

{
  "Annotations": {
    "DesiredTGUpdates": {
      "group-1": {
        "Ignore": 0,
        "Place": 0,
        "Migrate": 0,
        "Stop": 0,
        "InPlaceUpdate": 2,
        "DestructiveUpdate": 0
      }
    }
  },
  "FailedTGAllocs": null,
  "JobModifyIndex": 18,
  "CreatedEvals": null,
  "Diff": {
    "Fields": null,
    "ID": "api:prod",
    "Objects": null,
    "TaskGroups": [
      {
        "Fields": null,
        "Name": "group-1",
        "Objects": null,
        "Tasks": [
          {
            "Annotations": [
              "forces in-place update"
            ],
            "Fields": null,
            "Name": "api",
            "Objects": null,
            "Type": "Edited"
          }
        ],
        "Type": "Edited",
        "Updates": {
          "in-place update": 2
        }
      }
    ],
    "Type": "Edited"
  },
  "NextPeriodicLaunch": "0001-01-01T00:00:00Z",
  "Index": 18
}

The things I changed were the:

I don't think any of those would cause the destructive update but why don't you give it a try. Do you want to try modifying your job and seeing what of those is causing it to become destructive?

rickardrosen commented 7 years ago

I have tried changing around a bit of everything, still the update is always destructive.

I can see in your diff that you get an Annotation "forces in-place update". Mine are null, but still results in destructive update.

Shouldn't the diff tell me what's causing the teardown?

dadgar commented 7 years ago

@rickardrosen Yeah I am not sure how that is happening. So if you run the job is it destructive or in-place? Do allocations exist? Are they running or terminal?

rickardrosen commented 7 years ago

Can I use run on a json job somehow? Or is there an easy way of converting to HCL? I'd like to see if this is an issue with my job and the HTTP API...

Jobs are autogenerated (for consistency, to avoid destructive changes by accident amongst other things :) ), which is why they are JSON.

Allocations exists and are running.

So If I'm doing this:

This can't be the intended outcome?

dadgar commented 7 years ago

@rickardrosen The run cli command is more or less just parsing hcl -> json and using the same endpoint so there is no magic there. Sounds like you have a good setup 👍

No it is not the intended outcome! Is it possible for you to run the job against a nomad agent -dev mode cluster and try the plan. I wonder if it is something from upgrading a cluster.

rickardrosen commented 7 years ago

Looks like when I stopped the job and re-scheduled, updates are no longer always destructive. At least for the job I've been playing with. No changes to the job, but the plan result differs. Really weird.

Could be a nomad update issue, but I need to test some more to see if I can find some repro steps.

dadgar commented 7 years ago

@rickardrosen Okay lets open this issue up again when there are repro steps!

Nomon commented 7 years ago

I think I just ran into this issue with 0.5.6, any job updates that change meta are always destructive for all allocs even though nothing else is changed.

cbrisket commented 4 years ago

Experienced this as well on 0.8.4; caused by this issue

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.