hashgraph / hedera-block-node

New Block Node services
Apache License 2.0
18 stars 2 forks source link

refactor: Adjust the smoke tests so they are more robust in the GHA environment #324

Open mattp-swirldslabs opened 2 days ago

mattp-swirldslabs commented 2 days ago

As a block node developer I want the smoke tests to be reliable and consistent in the CI environment. So that I can know that failures in a smoke test directly relate to my PR changes.

Request Details

The recommended approach is to refactor the smoke tests so that they are more robust in the GHA environment. We would like liveness/health checks to execute before other tests, the server logs be exported for Github Actions to view and to no longer rely on pattern-matching log statements from the server to know when the server is running

Technical Details

We discussed a few enhancements to the smoke tests to help improve visibility and deterministic behavior in Github Actions

  1. Remove the bash logic pattern-matching on log statements from the software.
  2. Move the health and readiness checks so they are the first test before continuing on to testing consumer/producer and other actions
    • Also change these tests to have a retry loop (use a bash function for reuse), so that the check can run for a reasonable (perhaps up to 5 minutes?) before giving up and declaring the server startup failed.
  3. Export the server log statements so we can more easily debug server start failures

Additional Notes

mattp-swirldslabs commented 2 hours ago

Here's a trace in GHA when it fails:

Run ./smoke-test.sh
No log file provided, skipping startup pattern check.
Started consumer with PID [8](https://github.com/hashgraph/hedera-block-node/actions/runs/11561014938/job/32179097892?pr=316#step:10:9)05, logging to consumer.log
Started producer with PID 806, logging to producer.log
get-block.sh executed successfully.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     2  100     2    0     0    153      0 --:--:-- --:--:-- --:--:--   166
OK/healthz/livez endpoint is healthy.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     2  100     2    0     0   3058      0 --:--:-- --:--:-- --:--:--  2000
OK/healthz/readyz endpoint is ready.
Shutting down background processes...
Producer process 806 terminated.
Error: Consumer process 805 has already terminated.
Consumer logs:
Param is: 1
Starting consumer...
Started consumer with PID: 80[9](https://github.com/hashgraph/hedera-block-node/actions/runs/11561014938/job/32179097892?pr=316#step:10:10)
Failed to dial target host "localhost:8080": write tcp [::1]:44526->[::1]:8080: write: broken pipe