spotify / styx

"The path to execution", Styx is a service that schedules batch data processing jobs in Docker containers on Kubernetes.
Apache License 2.0
267 stars 50 forks source link

Need more reliable halt #458

Open ulzha opened 6 years ago

ulzha commented 6 years ago

The Styx CLI halt command works via the common QueuedStateManager, and slow processing of the queue is one of the more common overload symptoms in Styx. Thus when Styx is overloaded with elevated queued events count, styx halt doesn't appear to do anything.

The CLI error is moreover quite uninformative, like API error: 500 : "Request Request{method=POST, url=https://styx-scheduler.spotify.net/api/v0/halt, tag=Request{method=POST, url=https://styx-scheduler.spotify.net/api/v0/halt, tag=null}} failed"

ulzha commented 6 years ago

It might be preferable to even add a stronger variant, styx halt --kill or something, that lets users kill their Kubernetes container, instead of just issuing a halt event and waiting for its eventual processing.