lablup / backend.ai

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, TPU, IPU and other NPUs.
https://www.backend.ai
GNU Lesser General Public License v3.0
519 stars 153 forks source link

Improve log handling #3034

Open HyeockJinKim opened 1 week ago

HyeockJinKim commented 1 week ago

Main idea

References

Motivation

  1. When db or etcd is terminated, the manager continues to retry the connection. However, in this case, an exception message is displayed instead of a log.
  2. When the redis connection is terminated and the manager retries the connection, there is no log indicating when the connection is successfully restored. It is difficult to explicitly know the time of redis reconnection.

Tasks

  1. Ensure that an exception is logged when PostgreSQL or etcd connection retries fail.
  2. When a redis request is retried and successfully completes, log the successful retry event as an info log.
    • I think it would be better to run a ping request periodically to track the status of the redis server.

Expected Results

Alternative ideas

No response

Anything else?

No response