skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.76k stars 503 forks source link

[Kubernetes] sky check output corrupted when `~/.kube/config` exists but the GKE cluster is removed #2386

Closed Michaelvll closed 6 months ago

Michaelvll commented 1 year ago
sky check
Checking credentials to enable clouds for SkyPilot.
  AWS: enabled
  Azure: enabled
  GCP: enabled
  Lambda: enabled
  IBM: disabled
    Reason: Failed to import dependencies for IBM. Try running: pip install "skypilot[ibm]".
    Store your API key and Resource Group id in ~/.ibm/credentials.yaml in the following format:
      iam_api_key: <IAM_API_KEY>
      resource_group_id: <RESOURCE_GROUP_ID>
  SCP: enabled
  OCI: enabled
  Checking Kubernetes...ERROR:root:[Errno 2] No such file or directory: 'gke-gcloud-auth-plugin'
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fd3030bef80>, 'Connection to 34.31.250.181 timed out. (connect timeout=5)')': /api/v1/namespaces/default/pods
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fd3030bf3d0>, 'Connection to 34.31.250.181 timed out. (connect timeout=5)')': /api/v1/namespaces/default/pods
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fd3030bf490>, 'Connection to 34.31.250.181 timed out. (connect timeout=5)')': /api/v1/namespaces/default/pods
  Kubernetes: disabled
    Reason: Failed to communicate with the cluster - timeout. Check if your cluster is running and your network is stable.
  Cloudflare (for R2 object store): enabled
github-actions[bot] commented 11 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 7 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 6 months ago

This issue was closed because it has been stalled for 10 days with no activity.