Closed morganwalker closed 2 years ago
Hey @morganwalker, thanks for taking the time to submit a ticket.
I know it's been a long time, sorry for that, but if you could come up with a simplified, minimal version of the scenario which triggers this issue it would be very helpful.
Thanks.
@eikenb Thanks for checking in but we can go ahead and close this out.
@morganwalker That's great!
Mind if I ask what happened? Why it is no longer an issue?
Thanks.
We actually slimmed down our infrastructure stack quite a bit and no longer require Vault or envconsul.
@lopfe .. you :+1:'d this... are you still seeing it? Could you put together a simpler example to reproduce it?
I've looked over the code pretty closely and don't see a flaw in it. It blocks to wait on the child process exiting before moving on to start the new child process.
Could the processes in these cases have a forked child process themselves which doesn't exit until after the parent process? This could cause this issue as envconsul would be notified when the parent process exits and continue on to start the new process, but the child'd child (grandchild?) process would still be hanging around?
What is really needed is a way to reproduce this with a minimal setup that can be reproduced.
Encountered this issue today and found a minimal way to reproduce it. https://gist.github.com/nvllsvm/2c0e0561a3e472c9a53ba3bcd3be21eb
Thanks for repro @nvllsvm! Marked this to be looked into for the next release (which I'll be starting work on after I finish with the current consul-esm bugfix work).
I've taken a closer look at the repro example and it shows how a fork will trigger an early exit of the process. This behaves as it should ane means that a fork/exec pattern can't trigger this issue.
This issue is around the idea that the managed process wasn't being stopped before it was re-started and I still don't really think that is the bug. IMO the original issue sounds like it might have been an issue with the OS not releasing the port in time.
I'm going to close this as I still see no problems with the behavior (no repro) and code looks good.
Envconsul version
Configuration
vault.hcl
postgresql.hcl
Command
Debug output
debug logs from example in question debug logs from other apps experiencing similar behavior
Expected behavior
The original child process should have been completely terminated before spawning a new child process.
Actual behavior
The original child process is still bound to addresses and denies new child process from starting.
Steps to reproduce
We're using envconsul to spawn any process that needs postgres credentials. Our base image is built off of alpine:3.7, which installs envconsul
0.7.3
and runs:where our entrypoint performs:
Our postgres-exporter image then simply runs
CMD [ "/opt/start.sh" ]%
and when the container starts we'll see:We've played around with tweaking the vault configs exec splay, exec kill_signal, exec kill_timeout, wait mins and maxes, and
-once
, but so far whatever combination we've tried hasn't worked. What do we need to do in order to successfully kill the original child process so the successor can spawn?