Closed davidkopp closed 4 months ago
Although doing this can be done I think it is a patch which might not finally solve the issue.
The way to implement what you are shooting for is a proper healthcheck including a health-cmd
The implementation atm only supports interval
which will make the container being polled constantly and just create measurement noise.
At the time when I implemented the feature the docker engine did not support start-interval
. Now, with Docker Engine 25 it seems this is now possible to do! (Compare comment here: https://docs.docker.com/reference/dockerfile/#healthcheck on start interval)
You would set the start interval then on a very short period and the normal interval to an extremely large number to disable completely.
If that is what would solve your case I would prefer this implementation to setting the wait timeout just to a longer value.
What Docker Engine version do you have in your test setup for instance? I am currently unsure about which versions are rolled out to Linux package registries ...
Our Ubuntu 22.04 machine on the cluster has 26.1.3
That's good to know and using start-interval
would be definitely a preferable way of using the healthcheck during startup.
On my Windows machine Docker is running in version 26.1.1.
However, only the implementation of start-interval
would not solve the issue targeted by this pull request. Also the config wait_time_dependencies
has to be changed (or the implementation that uses it). With the current implementation, GMT waits at maximum the configured amount of time for a dependent service until it should be healthy, independently of the defined interval
(or start-interval
) in the service definition. For example, if wait_time_dependencies
is set to 20 seconds (currently the default) and interval
is set to 30 seconds, GMT will raise an exception that the service was not healthy after 20 seconds.
For a quick improvement I think it makes sense to increase the wait_time_dependencies
to e.g. 60 seconds. As another improvement it could be set to the value defined in interval
or start-interval
, if it is defined and higher than the configured value in wait_time_dependencies
.
For reference: I have introduced this config parameter wait_time_dependencies
in https://github.com/green-coding-solutions/green-metrics-tool/pull/568.
I do agree this needs a bigger PR to implement the functionality. What I see as a possible solution is that GMT either waits for wait_time_dependencies
for a fixed time or really uses the healthcheck mechanism with the given retries and health-cmd
.
I think what you are really asking for now is that we increase this limit on our configuration for the cluster, right? This does not need a PR as it is a configuration option anyway. Although I do see the case of making 60 the default.
So two things:
start-interval
so we can keep track of this functionality?Will merge this PR after
I have created the PR that implements the start-interval
parameter: https://github.com/green-coding-solutions/green-metrics-tool/pull/783
I'm not sure yet if it's necessary to change the value for wait_time_dependencies
on one of the machines in the cluster. If I would need it, it would be the machine "Fujitsu Esprimo P956 (Blue Angel compatible)".
I will merge this for now as the reasoning is very sound.
I will not increase it on th cluster for now. But if you need it let me know.
For anybody else reading this PR, please make a comment, so we know it is relevant for you
I'm currently adding the healthcheck feature to my microservices reference application. Compose file: https://github.com/t2-project/devops/blob/main/energy-tests/gmt/microservices-compose.yml
Unfortunately, the configured 20 second wait time for dependencies is not sufficient on my machine. The
inventory
service (Spring Boot) needs more than 20 seconds to become healthy.Of course on my local machine I can change the config on my own. However, that would not be possible using the measurement cluster. An workaround is to use a sleep in
setup-command
, how I did it already in the past without the healthcheck feature.I think it makes sense to increase the configured default wait time for dependencies, because 20 seconds seems to be too short for some Spring Boot applications in a microservices system to become healthy.