mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.39k stars 529 forks source link

chronos REST api server doesn't response #285

Open chengweiv5 opened 9 years ago

chengweiv5 commented 9 years ago

We encountered this issue twice in about 3 months, generally we have about 25K jobs per day and the api server may doesn't response sometime. when this issue happen, curl -X -L <host>:<port>/ping will hang up for ever.

kolloch commented 9 years ago

Hi @chengweiv5 , sorry for responding that late.

Did you ever do a stack trace when this happens? You can do that by issuing a kill -3 or using the jstack command. It is helpful if you get multiple stack traces so that we can compare what changed or didn't change in-between.

Maybe, all threads accepting requests are blocked.

chengweiv5 commented 9 years ago

thanks for your tips, @kolloch, we'll do that if we lucky to reproduce this issue.