lablup / backend.ai

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, TPU, IPU and other NPUs.
https://www.backend.ai
GNU Lesser General Public License v3.0
519 stars 153 forks source link

Extend accelerator plugin architecture to allow container user to join extra Linux groups #2850

Closed kyujin-cho closed 3 weeks ago

kyujin-cho commented 1 month ago

Main idea

There are cases where accelerator driver requires Linux user to join on specific groups (e.g. ROCm). To support such cases we can think of extending current accelerator plugin architecture to expose extra set of gids which container user (work user) will be joined at.

Alternative ideas

No response

Anything else?

This issue can work as a keystone job of #2592, as the implementation of this feature will allow way to make container user join extra groups other than default GID and 44 (shadow).

achimnol commented 1 month ago

We could add a new environment variable containing the list of additional GIDs to be passed to su-exec via entrypoint.sh like LOCAL_USER_ID and LOCAL_GROUP_ID.