skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.54k stars 466 forks source link

`sky check` from one Kubernetes cluster to another failing #3904

Open chhabrakadabra opened 2 weeks ago

chhabrakadabra commented 2 weeks ago

I'm trying to integrate skypilot into an Airflow DAG. I'm trying to use skypilot from inside a Kubernetes pod (using Airflow's KubernetesPodOperator). The trouble is that I want skypilot to connect to a different Kubernetes cluster (in GCP, Airflow (through cloud composer) runs in its own Kubernetes cluster, but I want the actual workload to run on a different Kubernetes cluster).

I've set the current kubernetes context inside of the first pod (the one running skypilot) to have access to the second cluster. I've confirmed that this works by running kubectl get pods and it is able to list the pods in the current context's namespace in the second cluster. But when I run sky check, it's unable to pick up the right credentials and even the right namespace.

I think the problem is that the skypilot code expects that if skypilot is running in cluster A, then the workload also needs to run on cluster A. For example, the code that picks up the K8s namespace prefers to get the namespace from in-cluster config over the active context. https://github.com/skypilot-org/skypilot/blob/master/sky/provision/kubernetes/utils.py#L774-L777

Version & Commit info:

Actually, the full output includes this warning:

/app/.venv/lib/python3.12/site-packages/sky/provision/docker_utils.py:327: SyntaxWarning: invalid escape sequence '\&'
  'sudo sed -i "s/mesg n/tty -s \&\& mesg n/" ~/.profile;'
skypilot, version 1.0.0.dev20240722
romilbhardwaj commented 2 weeks ago

Thanks for the report @chhabrakadabra. Here's the logic which first tries to load incluster auth, and then tries the kubeconfig:

https://github.com/skypilot-org/skypilot/blob/0203971a36dbcf0a6d3615e6ed25f3f6d11b53f6/sky/adaptors/kubernetes.py#L73-L84

IIUC, you want SkyPilot to ignore the incluster auth and instead read the kubeconfig, right?

As a workaround, is it possible to not automount the credentials by adding automountServiceAccountToken: false in your airflow pod spec?

chhabrakadabra commented 1 week ago

IIUC, you want SkyPilot to ignore the incluster auth and instead read the kubeconfig, right?

Yes, that is correct.

As a workaround, is it possible to not automount the credentials by adding automountServiceAccountToken: false in your airflow pod spec?

That's a great suggestion. Unfortunately, I'm unable to do so because of a bug in Airflow.

romilbhardwaj commented 1 week ago

Ah that bug is unfortunate. As an alternative to automountServiceAccountToken: false, you can mask the service account files in the pod by overriding the volume mount with an empty directory. This effectively hides the service account tokens to SkyPilot:

You'd need a pod spec like this for your airflow pod:

apiVersion: v1
kind: Pod
metadata:
  name: airflow-pod
spec:
  containers:
  - name: airflow-container
    image: your-image
    volumeMounts:
    - name: empty-token
      mountPath: /var/run/secrets/kubernetes.io/serviceaccount # Overwrites the service account token mount
  volumes:
  - name: empty-token
    emptyDir: {}

I've confirmed this disables the incluster auth codepath in SkyPilot logic and it should use the kubeconfig present in your pod.