skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k stars 513 forks source link

runpod docker credentials not working when using image_id from private repository #4269

Open alita-moore opened 2 weeks ago

alita-moore commented 2 weeks ago

Because runpod doesn't support docker I need to use my docker image as the base image_id but setting the SKYPILOT_DOCKER_USERNAME and SKYPILOT_DOCKER_PASSWORD doesn't allow the created runpod to have access to the private repo. However, if you manually change the dockerhub config on the runpod web UI (which I setup beforehand) it works fine. Is there a way to authenticate the docker image for the created pod automatically?

For reference here's my config:

resources:
  image_id: docker:teamwoven/convert:sultan-1.0
  cloud: runpod
  ports: 8000
  accelerators: RTX4090:1

service:
  readiness_probe: /status

envs:
  SKYPILOT_DOCKER_USERNAME:
  SKYPILOT_DOCKER_PASSWORD:
  SKYPILOT_DOCKER_SERVER: docker.io

setup: |
  #

run: |
  # Run the FastAPI server with two workers
  /home/woven/.local/bin/uvicorn extractor_inference.app:app --host "0.0.0.0" --port 8000 --workers 2

and then an image of the runpod UI that I'm referring to:

Screenshot 2024-11-07 at 5 18 01 AM

Version & Commit info:

alita-moore commented 2 weeks ago

also unrelated but it would be nice to be able to specify multiple cloud providers in the yaml

concretevitamin commented 2 weeks ago

also unrelated but it would be nice to be able to specify multiple cloud providers in the yaml

This is supported; check out https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html#multiple-candidate-resources.

alita-moore commented 2 weeks ago

Got it, thanks. Any idea about the docker credentials?

Michaelvll commented 2 weeks ago

It can be an issue with our pumping for the docker credentials for RunPod specifically. cc'ing @cblmemo for a look.

cblmemo commented 2 weeks ago

Yes, this is due to we directly setting the image id to create pod instead of using the DockerInitializer. A solution would be manually calling the runpod credential API if env variables is detected. Will submit a PR to fix this.