mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.39k stars 529 forks source link

One job get too many instances. #817

Open alex-ren opened 7 years ago

alex-ren commented 7 years ago

We are using Chronos v3 on Mesos with 3 slaves. Job docker-gc-trigger is scheduled to run every 6 hours. However, the job has run for tens of times in last hour. On the other hand Job ssdb-backup-task is stuck in queue, with the log message

Insufficient resources or constraints not met for task 'ct:1489722779470:0:ssdb-backup-task:', will append to queue. (Needed: [cpus: 1.0 mem: 512.0 disk: 256.0], Found: [cpus: 29.25 mem: 35481.0 disk: 54812.0])

The log for chronos goes as follows: stdout (20).txt

We didn't see anything that's obviously suspicious to us. Not sure whether restart Chronos may solve the problem. (Haven't done that.) Do you have any clue how the Chronos can get into such a weird state? What other log can we look into? We want to find out the cause of the issue so that we can use Chronos with more confidence. Thanks a lot ahead of time.