Add ability to restart all running tasks/allocs of a job

hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.

https://www.nomadproject.io/

Other

14.81k stars 1.94k forks source link

Add ability to restart all running tasks/allocs of a job #698

Closed supernomad closed 1 year ago

supernomad commented 8 years ago

So I would love the ability to restart tasks, at the very least restart an entire job, but preferably single allocations. This is very useful for when a particular allocation or job happens to get in a bad state.

I am thinking something like nomad restart <job> or nomad alloc-restart <alloc-id>.

One of my specific use cases, is I have a cluster of rabbitmq nodes, and at some point one of the nodes gets partitioned from the rest of the cluster. I would like to restart that specific node (allocation in nomad parlance), or be able to preform a rolling restart to the entire cluster (job in nomad parlance).

Does this sound useful?

thatsk commented 3 years ago

is this added in nomad UI. Or still in phase.?

stupidlamo commented 3 years ago

+1 to this feature, really need to shut down hashi-ui and use only nomad native, but can't due to unvailabilty of rolling restart

kunalsingthakur commented 3 years ago

yeah @tgross there is situation where container dependent on consul key-value and if we update key value in consul then after restart our service it will populate new values in out container so we really think this need to be allocated in nomad UI and get rid of hashiui . don't need to maintain two UI for nomad

kunalsingthakur commented 3 years ago

are we supposed to think this is on our roadmap

dg-eparizzi commented 3 years ago

+1 to this feature

josegonzalez commented 3 years ago

The way hashi-ui implements this is by injecting a label into the job, which messes with nomad job plan as the same job will result in a change as the local job won't have the injected label.

thatsk commented 3 years ago

Yes they are adding meta param like date time stamp

On Wed, 1 Sep 2021, 13:19 Jose Diaz-Gonzalez, @.***> wrote:

The way hashi-ui implements this is by injecting a label into the job, which messes with nomad job plan as the same job will result in a change as the local job won't have the injected label.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hashicorp/nomad/issues/698#issuecomment-910029887, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADU5G3LCV2KYJY6AXQFFCJTT7XLGZANCNFSM4BZJUGLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sbrl commented 3 years ago

The way hashi-ui implements this is by injecting a label into the job, which messes with nomad job plan as the same job will result in a change as the local job won't have the injected label.

An easy CLI subcommand / HTTP API call that function would be very handy.

victusfate commented 2 years ago

I ended up getting what I wanted (a rolling restart of an existing application) using the following python snippet and the Nomad HTTP API

    get_job_url = NOMAD_URL + os.path.join('/v1/job',job_id)
    get_job_response = requests.get(get_job_url)
    job = get_job_response.json()
    if 'Meta' not in job or job['Meta'] is None:
      job['Meta'] = {}
    job['Meta']['Restart'] = str(time.time())
    job = { 'Job': job, 'PreserveCounts': True }

    # now post it back
    post_url = NOMAD_URL + os.path.join('/v1/jobs')
    post_job_response = requests.post(post_url,json=job)
    print('restart job response',post_job_response.json())

maxadamo commented 2 years ago

Unless I'm overlooking a possible drawback, the command suggested by @mxab looks good to me. You can use any variation of the command and add it onto your shell aliases:

nomad job status <job-name> | awk '{if (/run(.*)running/) {system("nomad alloc restart " $1)}}'
nomad job status <job-name> | awk '/run(.*)running/{print $1}' | xargs -t -n 1 nomad alloc restart

Laboltus commented 2 years ago

Unless I'm overlooking a possible drawback, the command suggested by @mxab looks good to me. You can use any variation of the command and add it onto your shell aliases:
nomad job status <job-name> | awk '{if (/run(.*)running/) {system("nomad alloc restart " $1)}}'
nomad job status <job-name> | awk '/run(.*)running/{print $1}' | xargs -t -n 1 nomad alloc restart

As I understand "nomad alloc restart" doesn't re-download artifacts and docker images ? I need to restart a job with an actual docker image.

tgross commented 2 years ago

Doing some issue cleanup and realizing there's a whole lot of different feature requests being discussed in this issue over the years, many of which landed long ago. I'm going to re-title this issue to narrow the scope to the remaining request.

EugenKon commented 7 months ago

With the command above I can not restart the task which failed:

$ nomad job restart -task nginx-task portal
==> 2024-02-23T11:13:56-05:00: Restarting 1 allocation
    2024-02-23T11:13:56-05:00: Restarting task "nginx-task" in allocation "27caddf2" for group "services"
==> 2024-02-23T11:13:56-05:00: Job restart finished with errors

1 error occurred while restarting job:
* Error restarting allocation "27caddf2": Failed to restart task "nginx-task": Unexpected response code: 500 (Task not running)

$ nomad alloc restart 27caddf2
Failed to restart allocation:

Unexpected response code: 500 (restart of an alloc that should not run)

It is not clear how to restart failed task?

jippi commented 7 months ago

Please open a new issue for that. This issue is many years old and closed :)