mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.39k stars 529 forks source link

removeTasksForSlave: fix logic to identify tasks running on slave #824

Open meelapshah opened 7 years ago

meelapshah commented 7 years ago

I found this bug when I noticed to following:

  1. reboot a mesos slave with slave ID X
  2. the slave comes back up with slave ID Y
  3. chronos successfully runs jobs on the slave
  4. after --agent_reregister_timeout (default 10 mins) passes, mesos master sends SLAVE_LOST for slave ID X
  5. chronos starts launching jobs that are already running