hyperledger / besu

An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu
https://www.hyperledger.org/projects/besu
Apache License 2.0
1.52k stars 846 forks source link

Delay in /tmp/pid File Creation Causes unhealthy Status on Rocky Linux 8.6 #7876

Open sinbumu opened 2 weeks ago

sinbumu commented 2 weeks ago

I'm experiencing an issue with running Besu in a Docker/Podman container on Rocky Linux 8.6. The container stays in an unhealthy state for an extended period (approximately 9-10 minutes) before eventually switching to healthy. The delay appears to be related to the /tmp/pid file, which is not created immediately upon container startup, causing the health check to fail repeatedly.

Environment

Steps to Reproduce

  1. Launch a Rocky Linux 8.6 instance.
  2. Install Podman/Docker.
  3. Pull the latest Besu Docker image.
  4. Run the following command to start Besu:
    sudo podman run -d --name besu_node \
    -e BESU_LOGGING=TRACE \
    -p 8545:8545 -p 8546:8546 -p 30303:30303 \
    hyperledger/besu:latest \
    --rpc-http-enabled --rpc-http-host=0.0.0.0 --host-allowlist="*"
  5. Check the container health status (podman ps or docker ps) and observe that it remains in the unhealthy state.

Observed Behavior

The container stays in an unhealthy state for about 9-10 minutes, then transitions to healthy. During this time, the health check repeatedly fails with exitCode=1, which seems to be related to the /tmp/pid file not being available immediately.

Expected Behavior

The Besu container should create the /tmp/pid file promptly upon startup to allow the health check to succeed or provide an alternative health check method that accurately reflects the container's readiness state.

Logs

Here are relevant sections of the Docker logs showing repeated health check failures:

time="2024-11-12T06:58:15.854925216Z" level=debug msg="Health check for container done (exitCode=1)"
...
time="2024-11-12T06:58:20.905606739Z" level=debug msg="Health check for container done (exitCode=1)"

Additional Information

Questions

joshuafernandes commented 2 days ago

Hi @sinbumu there's a few things at play here re your setup that we'd be speculating at best. 10mins though is definitely long and we don't see this behaviour so can't reproduce this. That being said, this is where we set it https://github.com/hyperledger/besu/blob/main/docker/Dockerfile#L59 so you could try switching to perhaps a curl readiness check and that should tell you whether its the OS or similar? cc: @siladu