mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.38k stars 529 forks source link

Comparison method violates its general contract! #849

Open mcnicolas opened 7 years ago

mcnicolas commented 7 years ago

01:28:38.252 [qtp574746715-41] INFO mesosphere.chaos.http.ChaosRequestLog - 10.176.11.11 - - [20/Jul/2017:01:28:38 +0000] "POST //master.mesos/v1/scheduler/iso8601 HTTP/1.1" 204 0 "-" "Java/1.8.0_112" 7 01:28:38.340 [pool-4-thread-1] ERROR org.apache.mesos.chronos.scheduler.jobs.JobScheduler - Loading tasks or jobs failed. Exiting. java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeLo(TimSort.java:777) at java.util.TimSort.mergeAt(TimSort.java:514) at java.util.TimSort.mergeCollapse(TimSort.java:441) at java.util.TimSort.sort(TimSort.java:245) at java.util.Arrays.sort(Arrays.java:1438) at scala.collection.SeqLike$class.sorted(SeqLike.scala:648) at scala.collection.AbstractSeq.sorted(Seq.scala:41) at scala.collection.SeqLike$class.sortWith(SeqLike.scala:601) at scala.collection.AbstractSeq.sortWith(Seq.scala:41) at org.apache.mesos.chronos.scheduler.jobs.JobScheduler.registerJobs(JobScheduler.scala:578) at org.apache.mesos.chronos.scheduler.jobs.JobScheduler.liftedTree1$1(JobScheduler.scala:543) at org.apache.mesos.chronos.scheduler.jobs.JobScheduler.mainLoop(JobScheduler.scala:540) at org.apache.mesos.chronos.scheduler.jobs.JobScheduler$$anon$1.run(JobScheduler.scala:516) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

arichnad commented 3 years ago

Is there an update on this bug? Thanks!

mskluev commented 2 years ago

For anyone also still suffering on mesos, the root cause is a bug in chronos when it tries to decide what order to run jobs in. It compares the schedule values using a.isBefore(b). However, if a == b, a.isBefore(b) = false & b.isBefore(a) = false so java freaks out and chronos crashes.

The work around is to make sure every job defined has a unique schedule value. Also, I believe multiple blank timestamps in the schedules, such as R//PT1H, can also cause this bug.