Closed matbme closed 3 days ago
Seems killJob
truly calculates a wrong time duration, I think the right way is using the time in complete condition - the creationtime.
/good-first-issue
@Monokaix: This request has been marked as suitable for new contributors.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue
command.
Description
When restarting a
volcano-controllers-xxxx
pod, the Garbage Collector seems to check if previously completed jobs are ready for cleanup and "kills" them again, thus resulting in it updating thejob.Status.RunningDuration
variable, even though the job had already been completed before.I use the job duration variable to indicate to a user how long a task has taken after it was completed, which ends up changing every time the controller is restarted due to some issue.
Controller logs
``` I0910 13:24:49.411220 1 job_controller.go:269] worker 2 start ...... I0910 13:24:49.411222 1 job_controller.go:269] worker 1 start ...... I0910 13:24:49.411236 1 job_controller.go:269] worker 0 start ...... I0910 13:24:49.411234 1 job_controller.go:320] Try to handle requestSteps to reproduce the issue
Running Duration
inkubectl describe vcjob ...
Running Duration
inkubectl describe vcjob ...
again, the value has changed even though the job was already completedDescribe the results you received and expected
If I understand it correctly,
Running Duration
should indicate how long a job took from start until completion, so this variable shouldn't change after the controller is restarted.What version of Volcano are you using?
1.8.0
Any other relevant information
I tracked down the issue to this line in the Garbage Collector which should probably use the job's end time instead of the current time.
However, I'm not entirely sure
killJob
should be called at all given that it has already been killed before.