skypilot-org / skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.26k stars 429 forks source link

[k8s] Running docker on SkyPilot k8s cluster #3062

Open Michaelvll opened 5 months ago

Michaelvll commented 5 months ago

Many services has been packaged with docker, and it is unclear how to do a normal docker run on k8s cluster as the other clouds:

run: |
  docker run -it \
    --gpus all -p 8080:8080 -v $HOME/.tabby:/data \
    tabbyml/tabby \
    serve --model TabbyML/StarCoder-1B --device cuda
romilbhardwaj commented 3 months ago

For anyone thinking about this:

romilbhardwaj commented 1 month ago

Note - if the use case allows running privileged pods, you can run docker containers in a Kubernetes pod with:

# Example showing how to use docker-in-docker (dind) in a SkyPilot Kubernetes pod
#
# This example installs the docker runtime, but you can have it pre-installed in your image.
#
# Make sure your config allows for privileged containers. Add this to your ~/.sky/config.yaml:
#
# kubernetes:
#   pod_config:
#     spec:
#       containers:
#         - securityContext:
#             privileged: true
#
# NOTE - Here be dragons! Docker-in-docker in a Kubernetes cluster is generally
# not recommended. It is a security risk since pods must run in privileged mode
# and cause resource leaks. If you need to use a docker container in a Kubernetes
# cluster, specify it under resources.image_id in your task.yaml and use the
# ENTRYPOINT in the `run` section of the SkyPilot YAML.
# Suggested reading - https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/

resources:
  cloud: kubernetes

setup: |
  # ==== Install Docker ====
  # You can also bake this in your base image.
  # Add Docker's official GPG key:
  sudo apt-get update
  sudo apt-get install ca-certificates curl
  sudo install -m 0755 -d /etc/apt/keyrings
  sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
  sudo chmod a+r /etc/apt/keyrings/docker.asc

  # Add the repository to Apt sources:
  echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  sudo apt-get update
  sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

  # Issue with docker-ce 25 (https://github.com/docker/cli/issues/4807)
  # So we need to manually set ulimit
  sudo sed -i 's/ulimit -Hn/# ulimit -Hn/g' /etc/init.d/docker

  # Start Docker
  sudo service docker start

run: |
  # Use docker run as usual
  sudo docker run --rm hello-world
Michaelvll commented 3 weeks ago

Is it possible to use docker socket instead of setting privileged: true, i.e. the following might work?

spec:
  containers:
  - name: docker
    image: docker
    volumeMounts:
    - name: docker-socket
      mountPath: /var/run/docker.sock
  volumes:
  - name: docker-socket
    hostPath:
      path: /var/run/docker.sock
      type: Socket
chymian commented 1 week ago

@romilbhardwaj why is that not prominently mentioned in the docu on the first k8s/docker page? just trew away hours of work because the docu is wrong!

For anyone thinking about this:

SkyPilot on K8s does not support Docker-in-Docker. I.e., the docker socket is not exposed to the container since this can leak containers on hosts. The current recommended way to run your docker image on Kubernetes is to use the image_id field in the YAML to use it as the runtime environment: