lablup / backend.ai

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, TPU, IPU and other NPUs.
https://www.backend.ai
GNU Lesser General Public License v3.0
519 stars 153 forks source link

Support committing and pushing a container image from a running session #563

Open achimnol opened 2 years ago

achimnol commented 2 years ago

There are several customer requests for supporting "freezing" their current compute sessions.

Historically, we intentionally have not added this feature due to inherent volatility and security restrictions of Docker containers. The filesystem that users see in the sessions is a composition of multiple local/remote filesystems, such as the container image, vfolders, and scratch directories. For security, we don't expose the root user inside the container by default, so there is usually no way to modify the container filesystem provided by the image because all system packages and directories are owned by root.

To allow installation of additional packages using user-site paths in Python (pip install --user) and make it persistent across different compute sessions, we have added the following features:

Nevertheless, many HPC/AI customers want to use the containers like VMs, and it is not easy to fill the conceptual gap between the volatile & hermetic nature of containers and the full ownership of volume data of VMs.

So, despite whatever additional cautions required when committing a Backend.AI compute session, let's make it technically available.

adrysn commented 2 years ago

Requirements for this feature: