lablup / backend.ai

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, TPU, IPU and other NPUs.
https://www.backend.ai
GNU Lesser General Public License v3.0
501 stars 151 forks source link

Add explicit `ipc-base-path` config to the agent watcher #1640

Open achimnol opened 10 months ago

achimnol commented 10 months ago

Although we are planning to implement a separate watcher framework, until then, we need to live with the single-module agent watcher.

It currently uses a hard-coded ipc-base-path /tmp/backend.ai/ipc, which is the default for other service daemons like storage-proxy, manager, etc.

The problem is that, after #1246, there are cases that agent + agent watcher runs together with storage-proxy, and if storage-proxy's ipc-base-path option is left as the default, storage-proxy may fail to initialize the logger socket when the agent watcher starts earlier and creates the ipc directory as the root privilege.

We need to teach the agent watcher to use a separate ipc-base-path configuration. In the meanwhile, we could explicitly configure storage-proxy's ipc-base-path to other than the default to workaround the issue.

achimnol commented 10 months ago

This will be applied in #1560.