Closed shousper closed 5 years ago
Thanks for raising this issue, really appreciate the detail and recommendation!
Do you see it being more useful for your use case to have the grace period configurable in the service configuration, a command-line flag, or both?
No worries mate, thanks for building and maintaining such a useful tool for us!
I think both could be most helpful. We don't tend to use command line flags for edward
at all, everything is configured via an edward.json
for convenience and sharing. I'd say in priority order:
1) Service level override minimum would sort us out for a good while. 2) Top-level config override would be nice to have, but not absolutely required. 3) Command line flag for the top-level override might be handy, but also probably a nice to have.
Sounds good!
Let's focus on getting (1) done first. I can probably get some time to work on this over the next week, unless you or someone you know would be interested in putting together a PR.
That would be fantastic! We're all pretty flat out over here, but I'll ask around 👍
@shousper - while you are there, do you have a suggestion for the syntax to add to the config ?
Im thinking something simple like this on the service definition
{
"name": "my-awesome-service",
"commands": {
"launch": "my-awesome-service"
},
"terminate": {
"signal": "11", // UNIX signal ID. Default = syscall.SIGINT
"wait": "30s" // use time.Parse() syntax. Default = 5s
}
}
with the ability to add "terminate" to the group definition as well, and inherit that at each service level.
Thoughts ?
Another way to achieve the same thing - use the service::stop directive, and wrap it in a shell script that does what you need ?
Would the "signal" here be referring to the initial signal sent, or the one sent after the "wait" period?
Initial signal 👍
One snag with setting the initial signal sent is that the runner is signaled rather than the service itself.
Edward starts a separate runner process that maintains the service process as a child. If the runner receives an interrupt, it will attempt to stop the service as appropriate to the backend being used.
For the command-line backend, this is currently SIGINT. For the Docker backend, this is a call to the Docker API to stop the container.
The wait period is applicable to all backends, the signal itself is not, so they'd need to be separate in the config.
I'm working on a change that will add a "terminationTimeout" option (default 30s) to service config and set the default signal for stopping command-line services to SIGTERM. Will upload as a PR and link here.
I've uploaded a PR with a proposed implementation (see link above). I think the test might need some re-working, but it demonstrates the idea.
Hi @theothertomelliott, I just checked out your branch now for some testing and this looks great 👍 Very happy with the new defaults and configuration field 🎉 Stoked with how fast you did this! Thanks
My pleasure! Glad the changes fill the need.
I'm going to do a little cleanup on the test, but should be able to get that merged and drop out a minor release over the next few days.
Regarding:
If the runner receives an interrupt, it will attempt to stop the service as appropriate to the backend being used.
I'd say what you've done is sufficient. Once the "backends" feature reaches maturity and you find you need to move this timeout value to a command line backend specific property, I'd say that'd be fine too. Can always deprecate and move these things over time 😃.
As far as Docker, it has a Dockerfile command for the stop signal and you already support overriding config for this. So I'd say that one is already solid 👌
Deployed version 1.8.19 with the configurable timeout.
Could we make the interrupt grace period configurable per-service?
I found here its hard-coded to 5 seconds, but this isn't reasonable for services that have a few tasks to perform in order to shutdown gracefully.
I believe, for example, Docker's default is 10 seconds. This would be a much better default IMO, but ultimately I think a configurable grace period would interop better if users have services that might have variant shutdown periods.
As an aside, I'd also recommend either making the stop signal configurable (at the global level) or change it to SIGTERM instead of SIGINT, as this is also standard for Docker.
This 2 enhancements will help reduce the overhead to replicate production behaviours in, for example, AWS ECS or EKS/Kubernetes.
EDIT: Docker's default is 10 seconds, not 30. Updated. However ECS, EKS and Kubernetes default to 30.