signalfx / maestro-ng

Orchestration of Docker-based, multi-host environments
https://signalfx.com
Apache License 2.0
683 stars 83 forks source link

Maestro reports failed to start but host shows containers are running #191

Closed justanr closed 7 years ago

justanr commented 7 years ago

We're receiving an exception from the StartTask that some containers failed to report a running status. However, when I look at the host daemon, the containers are running just fine.

I'm able to reproduce this by sleeping for 20 seconds in the entry point script, the actual trigger -- which is inconsistent at best -- is compiling django translations in the entrypoint (which isn't being done at build for reasons).

When I run locally with the sleep, docker ps it shows the container up, and running docker inspect --format='{{.State}}' <id> it shows running as the state.

The error bubbles from maestro.play.tasks.StartTask and looks like the internal _wait_for_status helper is returning False; however, I'm not sure where the actual error is stemming from (docker-py, the container wrapper, etc).

mpetazzoni commented 7 years ago

Yep, this can happen depending on your situation. Can you give me more details about your process and your configuration? In particular, do you have any lifecycle checks for the running state configured in Maestro?

justanr commented 7 years ago

Well now I feel like a putz, there is a lifecycle hook hidden in the very root template we're using that hits up an http endpoint, which isn't coming up within the allotted timeframe sometimes.

mpetazzoni commented 7 years ago

No worries. Glad you found where your problem came from!