skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.7k stars 495 forks source link

[K8s] Optimizer still shows kubernetes as candidate when the cluster is all occupied #4154

Open cblmemo opened 17 hours ago

cblmemo commented 17 hours ago

Currently, when the k8s cluster is fully occupied, the optimizer will still shows it as candidate. For example, in the replica resources optimization result, it select k8s as resources, but actually it launches on GCP.

(base) root@49aaf5a031fc:/skycamp-tutorial/03_inferencing_and_serving# sky serve up service.yaml -n llm-service --env BUCKET_NAME
Service from YAML spec: service.yaml
Verifying bucket for storage skycamp24-finetune-f98d-0
Storage type StoreType.GCS already exists.
Service Spec:
Readiness probe method:           GET /v1/models
Readiness initial delay seconds:  1200
Readiness probe timeout seconds:  15
Replica autoscaling policy:       Fixed 2 replicas
Spot Policy:                      No spot fallback policy

Each replica will use the following resources (estimated):
Considered resources (1 node):
----------------------------------------------------------------------------------------------------------------------------------------------------
 CLOUD        INSTANCE          vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE                                                    COST ($)   CHOSEN  
----------------------------------------------------------------------------------------------------------------------------------------------------
 Kubernetes   8CPU--16GB--1L4   8       16        L4:1           gke_skycamp-skypilot-fastchat_us-central1-c_skycamp-gke-test   0.00          ✔    
 GCP          g2-standard-8     8       32        L4:1           us-east4-a                                                     0.85                
----------------------------------------------------------------------------------------------------------------------------------------------------
Launching a new service 'llm-service'. Proceed? [Y/n]:
Verifying bucket for storage skycamp24-finetune-f98d-0
Launching controller for 'llm-service'...
Considered resources (1 node):
--------------------------------------------------------------------------------------------------------------------------------------------------
 CLOUD        INSTANCE        vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE                                                    COST ($)   CHOSEN  
--------------------------------------------------------------------------------------------------------------------------------------------------
 Kubernetes   4CPU--4GB       4       4         -              gke_skycamp-skypilot-fastchat_us-central1-c_skycamp-gke-test   0.00          ✔    
 GCP          n2-standard-4   4       16        -              us-central1-a                                                  0.19                
--------------------------------------------------------------------------------------------------------------------------------------------------
⚙︎ Launching serve controller on Kubernetes.
└── Pod is up.
✓ Cluster launched: sky-serve-controller-9f92c97d.  View logs at: ~/sky_logs/sky-2024-10-23-00-41-06-916345/provision.log
⚙︎ Mounting files.
  Syncing (to 1 node): /tmp/service-task-llm-service-6vgt96ab -> ~/.sky/serve/llm_service/task.yaml.tmp
  Syncing (to 1 node): /tmp/tmpz9j3n79w -> ~/.sky/serve/llm_service/config.yaml
✓ Files synced.  View logs at: ~/sky_logs/sky-2024-10-23-00-41-06-916345/file_mounts.log
⚙︎ Running setup on serve controller.
  Check & install cloud dependencies on controller: done.                  
✓ Setup completed.  View logs at: ~/sky_logs/sky-2024-10-23-00-41-06-916345/setup-*.log
⚙︎ Service registered.

Service name: llm-service
Endpoint URL: [34.55.247.200:30001](http://34.55.247.200:30001/)
📋 Useful Commands
├── To check service status:    sky serve status llm-service [--endpoint]
├── To teardown the service:    sky serve down llm-service
├── To see replica logs:        sky serve logs llm-service [REPLICA_ID]
├── To see load balancer logs:  sky serve logs --load-balancer llm-service
├── To see controller logs:     sky serve logs --controller llm-service
├── To monitor the status:      watch -n10 sky serve status llm-service
└── To send a test request:     curl [34.55.247.200:30001](http://34.55.247.200:30001/)

✓ Service is spinning up and replicas will be ready shortly.
(base) root@49aaf5a031fc:/skycamp-tutorial/03_inferencing_and_serving# sky serve status llm-service
Services
NAME         VERSION  UPTIME  STATUS      REPLICAS  ENDPOINT            
llm-service  -        -       NO_REPLICA  0/2       [34.55.247.200:30001](http://34.55.247.200:30001/)  

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT  LAUNCHED  RESOURCES  STATUS   REGION  
llm-service   1   1        -         -         -          PENDING  -      
llm-service   2   1        -         -         -          PENDING  -      
(base) root@49aaf5a031fc:/skycamp-tutorial/03_inferencing_and_serving# sky serve status llm-service
Services
NAME         VERSION  UPTIME  STATUS      REPLICAS  ENDPOINT            
llm-service  -        -       NO_REPLICA  0/2       [34.55.247.200:30001](http://34.55.247.200:30001/)  

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT  LAUNCHED        RESOURCES          STATUS        REGION    
llm-service   1   1        -         a few secs ago  1x GCP({'L4': 1})  PROVISIONING  us-east4  
llm-service   2   1        -         a few secs ago  1x GCP({'L4': 1})  PROVISIONING  us-east4
romilbhardwaj commented 14 hours ago

This is intentional to keep aligned with cloud behavior - we try to submit a pod and let the cluster determine if it can fit or not, just like how clouds inform us if they are out of capacity. This feature also allows users to "queue" jobs by setting a provision_timeout in their config, which lets the pod stay pending for a while before giving up.