Open tomgco opened 9 years ago
Here is the start of the announcer:
Feb 26 16:41:31 core-01 systemd[1]: Starting paz-orchestrator announce...
Feb 26 17:05:36 core-01 sh[1195]: Waiting for 49153 HostIp:0.0.0.0/tcp...
Feb 26 17:05:36 core-01 sh[1195]: grep: HostIp:0.0.0.0: No such file or directory
So it means that the following code is causing the above error message: me!
until netstat -lnt | grep :$port >/dev/null; \
do sleep 1; \
done"
So all I can find is that we have a discrepancy when trying to sed
for the bound ports when using docker inspect.
How it should look:
Waiting for 49153/tcp...
What it is currently:
Waiting for 49153 HostIp:0.0.0.0/tcp...
My guess is something funky is going on with $port / a race condition in the bootstrapping of the container. https://github.com/yldio/paz/blob/d53997470d5263a2334a16b3835380a8d849dd22/unitfiles/1/paz-orchestrator-announce.service#L19-L21
port=$(docker inspect -f '{{ index .NetworkSettings.Ports \"9000/tcp\"}}' paz-orchestrator \
| sed 's/.*Port://' \
| sed 's/].+*//'); \
Maybe @sublimino might be able to offer some input into robustifying the shell code to bail out if this condition occurs.
@tomgco thanks for all the details! i'll be taking a look at this on the weekend.
I've seen this before when units fail to start up correctly, probably, as you identify @tomgco, when things don't start in the right order. This is an issue in and of itself, but one I haven't any insight into atm.
In the meantime, if this bash code can be made more robust then perhaps we may not see it again.
Got any ideas, @sublimino?
I think adding dependencies to the units and leveraging systemd's native ordering is the first port of call, and I'll have a glance at the BASH scripts when I'm at a computer tomorrow. On 7 Mar 2015 19:39, "Luke Bond" notifications@github.com wrote:
I've seen this before when units fail to start up correctly, probably, as you identify @tomgco https://github.com/tomgco, when things don't start in the right order. This is an issue in and of itself, but one I haven't any insight into atm.
In the meantime, if this bash code can be made more robust then perhaps we may not see it again.
Got any ideas, @sublimino https://github.com/sublimino?
— Reply to this email directly or view it on GitHub https://github.com/paz-sh/paz/issues/15#issuecomment-77706016.
:+1:
above @sublimino is referring to #30 btw
This cropped up again when provisioning a machine on Digital Ocean on paz-orchestrator-announce.service, maybe something is missing from the unit file?
A fleetctl stop and start on paz-orchestrator-announce.service fixed this.
Yeah I've seen this I've seen this quite a bit this weekend :/
However this timed out and then:
However the service seemed to be up and running: