Open alita-moore opened 2 weeks ago
it seems the issue is that I had a non-root default user. I had to install sudo for the init script to run but then it seems that the ssh was trying to sign in as root so ssh wouldn't connect. Is it possible to change the ssh user? It would be better in terms of security.
Thanks for reporting!
ports 8266,6380
We don't need to expose those port. It should be possible to remove these ports: https://github.com/skypilot-org/skypilot/blob/master/sky/provision/runpod/utils.py#L157-L158
it seems the issue is that I had a non-root default user. I had to install sudo for the init script to run but then it seems that the ssh was trying to sign in as root so ssh wouldn't connect. Is it possible to change the ssh user? It would be better in terms of security.
We have to find a way to get the username in the docker image and add it to the ClusterInfo
for the cluster created on RunPod here: https://github.com/skypilot-org/skypilot/blob/master/sky/provision/runpod/instance.py#L180-L186
A reference for how we do it for kubernetes: https://github.com/skypilot-org/skypilot/blob/master/sky/provision/kubernetes/instance.py#L891-L920
I suppose we can do something similar for runpod, by using some their cloud API to fetch the username.
We would really appreciate your contribution to these two issues.
I think the easiest way to do this would just be to use a special environment variable that defines what the docker user should be. The benefit of it being automatic seems small compared to the complexity.
I'll take a look the next time I'm working on the skypilot side, but I'm unfortunately quite bandwidth-constrained right now.
I think the easiest way to do this would just be to use a special environment variable that defines what the docker user should be. The benefit of it being automatic seems small compared to the complexity.
Ahh, this makes sense. It might worth using a SKYPILOT_DOCKER_SSH_USERNAME
as we did for the login password. cc'ing @cblmemo
I am trying to use a custom image_id for the creation of a skypilot service which is running on
runpod
. I am using a standard docker image, but am getting the following error.should I be installing some dependencies or running a server on the replica / image? I noticed that ports 8266,6380 are exposed but I don't have any services runn on those ports and I didn't update the ssh public keys or what have you.
Version & Commit info:
sky -v
: skypilot, version 0.7.0sky -c
: 3f625886bf1b13ee463a9f8e0f6741f620f7f66f