mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.39k stars 529 forks source link

Add max completion time to taskInfo #881

Closed edi33416 closed 6 years ago

edi33416 commented 6 years ago

Please see issue #880 for context

edi33416 commented 6 years ago

I'm getting the following build error, even though TaskInfo.Builder has the setMaxCompletionTime method.

[ERROR] /home/fawkes/chronos/src/main/scala/org/apache/mesos/chronos/scheduler/mesos/MesosTaskBuilder.scala:129: error: value setMaxCompletionTime is not a member of org.apache.mesos.Protos.TaskInfo.Builder
[ERROR]       taskInfo.setMaxCompletionTime(maxCompletionTime)
gkleiman commented 6 years ago

@edi33416 thanks! The patch looks good to me. Would you mind updating the "Writing and reading ScheduledBasedJob a job works" test in PersistenceStoreSpec.scala?

TaskInfo.max_commpletition_time is a recent field added on Mesos 1.6.0.

Chronos currently uses an old version (1.0.1) of libmesos through https://github.com/mesosphere/mesos-utils/: https://github.com/mesos/chronos/blob/master/pom.xml#L37

I'll try to release a new version of https://github.com/mesosphere/mesos-utils/ on Monday using libmesos 1.6.0 and let you know so that you can update and test the PR.

gkleiman commented 6 years ago

@edi33416 releasing mesosphere/mesos-utils was easier than I remembered, you should now be able to update the dependency and use the new field.

edi33416 commented 6 years ago

@gkleiman Thank you for updating mesos-utils.

Is this what you had in mind with

Would you mind updating the "Writing and reading ScheduledBasedJob a job works" test in PersistenceStoreSpec.scala?

gkleiman commented 6 years ago

@edi33416 Thanks for the update.

I meant actually updating the test to check that the new field is correctly persisted. This means that the test should use set the new field, store the job, and check that the job restored from the store has the right value.

edi33416 commented 6 years ago

@gkleiman Sorry for the delay. I updated the test

gkleiman commented 6 years ago

Thanks! LGTM

@edi33416 have you tested the changes? If so, how? Does the UI have to be updated too?

edi33416 commented 6 years ago

@gkleiman I did test the changes on my laptop, on a test Mesos Cluster. I submitted jobs with various maxCompletionTime values to the scheduler and verified that they were timed out and the job was marked as failed. To check that it was the correct time I did something very basic: each job would print something every second, so I just did a wc -l on stdout after it timed out and got killed by the executor.

This does not update the UI, but the UI is already lacking other supported fields, such as constraints. Because the UI is incomplete, there is the following issue: if you edit a job from the UI, that had fields set which aren't displayed in the UI, they will be reset to their default value. (Ex. if your job had constraints: [['key', LIKE, 'value']], after an UI edit, the constraints will be [[ ]])

gkleiman commented 6 years ago

Merged, thanks for your contribution!

edi33416 commented 6 years ago

@gkleiman Thank you. Will you cherry-pick this onto 2.5.x?

gkleiman commented 6 years ago

@edi33416 the patches don't apply cleanly on https://github.com/mesos/chronos/tree/stable.

I don't have time to work on it myself, but I would be happy to review a PR.

edi33416 commented 6 years ago

@gkleiman I'll cherry-pick them then and come back with a PR