Closed dzavalkinolx closed 7 years ago
I would love to see this and I will try to get around to submitting a PR for it if I have the time. In the meantime, I tag this on to the end of the job name as a workaround:
metadata:
name: {{ template "fullname" . }}-{{ randAlphaNum 5 | lower }}
I'm currently using workaround from @thomastaylor312 and it is very-very-very bad to do it this way. There are 2 issues:
maxRetry
parameter here.As a result I've just run into a performance issue with a storm of kill container / start container events.
@dzavalkinolx: We should probably bring this up in the kubernetes issues. That isn't so much a problem with Helm as it is with how Kubernetes Jobs work. Have you also tried setting restartPolicy: OnFailure
so that it just restarts the container instead of creating a new pod?
This is a tricky one. Helm doesn't implicitly manage the lifecycles of things like this. In particular, it never deletes a user-created resource without receiving an explicit request from a user.
Now, we recently introduced an annotation called resource-policy
that we could use for something like this. Currently, the only defined resource policy is keep
, which is used to tell Tiller not to delete that resource on a normal deletion request. I suppose we could implement another policy with something like delete-on-completion
that deleted a Pod or Job on completion.
Since Tiller does not actively monitor resources once they are deployed, I'm not sure this would be a terribly powerful annotation, but it could work on hooks because we do watch hooks for lifecycle events.
@dzavalkinolx Have you tried activeDeadlineSeconds
?
@longseespace I tried that too, but it just kills the job after the amount of time and spins up a new one
I ran into this problem today too. I found out about Helm Hooks recently and I thought it would be perfect to implement pre-upgrade database migrations (rake db:migrate
, for those of you familar with Rails). But then I found that Helm does not clean up completed jobs, so this doesn't work on a CI environment.
I'm probably missing something, but what's the point of having a job executed in a pre-upgrade / post-upgrade hook if once the job gets executed the first time you do helm upgrade
, the next one it will fail?
That's why we recommend appending a random string to a job name if you know for sure you are going to re-run a job again.
I see, I'm not sure I like the approach of leaving behind a new successful job for every deployment. Right now what I've done is adding a new step to our deployment script that does kubectl delete job db-migrate-job --ignore-not-found
before our helm upgrade
.
It feels like helm should be managing the full lifecycle of its hooks, as they're documented as the approach to take when it comes to executing lifecycle events such as migrations. Someone without a strong understanding of kubernetes could end up with hundreds of useless jobs.
I almost feel like the ideal solution here would just be to have tiller delete an old job if it hits a name conflict with a hook job it attempts to create. You could verify that the job was required by helm by ensuring that the correct hook annotation is present. You could also put this behavior behind a command line flag, such as --overwrite-hooks
with a better error message for users:
Error: UPGRADE FAILED: jobs.batch "opining-quetzal-dns-upsert" already exists
Rerun your upgrade with --overwrite-hooks to automatically replace existing hooks
I am having the exact same challenges with a python / django app. Does anyone have a mechanism such that helm will abort an upgrade if the pre-upgrade job fails?
I may take this once I get some other work done for 2.5. Assigning to myself for now, if someone else wants to take it before I work on it, let me know
@thomastaylor312 Have you finished this feature yet? If not, I would like to take this work.
@DoctorZK Feel free to take it. Thank you for offering to do it!
What about a simple annotation helm.sh/resource-policy: delete-job-after-run
? When the job successfully run, helm delete this job.
Good suggestion. I have thought out two approaches to solve this problem.
Add a simple annotation in the hook templates, which is the easiest to implement. However, it can not solve this kind of problem: helm fails during the pre-install/pre-upgrade process, but users try to install/upgrade the release with the same chart again, which will incur resource objects name conflict in K8S. Therefore, with the approach, we should add another annotation, such as helm.sh/resource-policy: delete-job-if-job-fails
.
Add flags after install/upgrade/rollback/delete commands (e.g., upgrade $release_name --include-hooks) which can solve name conflict problem, however, it will also remove some kinds of hooks that users are not intended to delete, such as configmaps and secrets that are designed to use repeatedly by different versions of the same release.
I prefer the first one, which can control hooks with a finer granularity.
@DoctorZK as you suggests I would like to have another annotation helm.sh/resource-policy: delete-job-if-succeed
. This is important when you're deleting a helm and have a job for cleanup, for now it's not possible to delete this job without running another job.
Are you willing to work on this? if not I can take care of it
Thanks for your help. I have finished the coding process, and now is under test. I will submit the pull request as soon as possible.
For those who want to have a workaround for this, until the final solution is implemented. Instead of appending random characters to the Job object's name, the Job may delete itself from the APIserver as the last task before exiting. It is equally "elegant" but not leaking Job objects. Sad facts:
Can we get a quick run-down here on how this works while documentation is still forthcoming?
Specifically, I am interested in understanding how to add/run/monitor/update a job to an existing helm deployment. Thx!
For executing a job, you can define the job as pre-upgrade/post-upgrade hook and upgrade the release.
OK, I was having an issue with a post-upgrade job continually failing (my issue, obviously) and restarting despite restartPolicy: Never
. I'm sorry if I'm not comprehending this thread, but am I to understand that that behavior is now "fixed" with the canary build. Or do I need to set some sort of configuration?
Here is my current configuration:
{{- if .Values.job.enabled -}}
apiVersion: batch/v1
kind: Job
metadata:
name: "{{.Release.Name}}"
labels:
heritage: {{.Release.Service | quote }}
release: {{.Release.Name | quote }}
chart: "{{.Chart.Name}}-{{.Chart.Version}}"
annotations:
"helm.sh/hook": post-install,post-upgrade
spec:
template:
metadata:
name: "{{.Release.Name}}"
labels:
heritage: {{.Release.Service | quote }}
release: {{.Release.Name | quote }}
chart: "{{.Chart.Name}}-{{.Chart.Version}}"
spec:
restartPolicy: Never
containers:
- name: importer
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
command: ["{{.Values.job.binary}}","{{.Values.job.cmd}}"]
{{- end -}}
There is a new annotation for it. The hook docs have been updated on master
Documented here - https://github.com/kubernetes/helm/blob/master/docs/charts_hooks.md
Adding this annotation worked for me:
"helm.sh/hook-delete-policy": hook-succeeded
I downloaded Helm v 2.6.2 and looked at the source and "helm.sh/hook-delete-policy" doesn't seem to be in there. Do you know when this will make it into a release?
@k8vance88 This will be landing in 2.7. We will be releasing an RC of it soon once we have the k8s 1.8 support merged in
@thomastaylor312 Did this land in 2.7?
yes, everything currently in master landed in 2.7.
Just to be clear... if I add "helm.sh/hook-delete-policy": hook-succeeded
to my job. Then every deployment should recreate and re-run that job? Because that's not what I'm seeing here.
@macropin Is your hook defined as a post-install,post-upgrade
(or pre
as your case may be)? If it only has the post-install
it will only run the first time
@thomastaylor312 It's defined as post-install,post-upgrade
, so you're saying it should be working?
It will create a new object (generally a Pod
or Job
) each time you release. If you use the feature mentioned here, it will delete the job when it is done running. If for some reason a hook isn't deploying, it would be a separate issue
The job has the following annotations:
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed
The job only runs once on the first install, and never again on subsequent upgrades. Running Helm v2.7.0. Should I create a separate issue for this?
@macropin Yes. Could you please create another issue with details about your cluster and, if possible, an example chart that duplicates the issue
The job has the following annotations:
annotations: "helm.sh/hook": post-install,post-upgrade "helm.sh/hook-weight": "5" "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
The job only runs once on the first install, and never again on subsequent upgrades. Running Helm v2.7.0. Should I create a separate issue for this?
@macropin How did you solve it? I'm facing a similar issue. It never creates job every subsequent upgrade. I'm using the same annotation as yours.
helm version: v2.14.3
@sohel2020 and @macropin - Same board, helm v3. The job is never re-ran on subsequent upgrades.
What is the solution for rerunning jobs on helm v3?
Same problem with Helm version: version.BuildInfo{Version:"v3.1.2", GitCommit:"d878d4d45863e42fd5cff6743294a11d28a9abce", GitTreeState:"clean", GoVersion:"go1.13.8"}
Jobs are neither deleted nor run on subsequent upgrade commands.
We regularly hit this one too in Helm 3.1.
In the future I believe we will be able to rely as well on job TTL
@paologallinaharbur only partially: for example if job ttl is 5 min, and you make a commit before and the previous commit caused job to fail so it is still there, you will have same issue
The two options that have worked for me:
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed
policy, which is a better approach than item 1 BUT this approach will drop the logs of failed job which could be detrimental in some cases for troubleshootingThe best option would be to have a hook policy that is applied before a hook is run, eg something like
annotations:
"helm.sh/hook-delete-policy": previous-hook-failed
What is the solution for rerunning jobs on helm v3?
@thecrazzymouse Were you able to find a solution?
I have a job bind to
post-install,post-upgrade,post-rollback
when i’m running update charts i get
Error: UPGRADE FAILED: jobs.batch "opining-quetzal-dns-upsert" already exists
kubectl get jobs
returnsSo, how we are supposed to use jobs as hooks if it is not deleted after it is successfully finished? There is no way to update chart if it has such job.