[K8s] Optimizer still shows kubernetes as candidate when the cluster is all occupied

Currently, when the k8s cluster is fully occupied, the optimizer will still shows it as candidate. For example, in the replica resources optimization result, it select k8s as resources, but actually it launches on GCP.
(base) root@49aaf5a031fc:/skycamp-tutorial/03_inferencing_and_serving# sky serve up service.yaml -n llm-service --env BUCKET_NAME
Service from YAML spec: service.yaml
Verifying bucket for storage skycamp24-finetune-f98d-0
Storage type StoreType.GCS already exists.
Service Spec:
Readiness probe method:           GET /v1/models
Readiness initial delay seconds:  1200
Readiness probe timeout seconds:  15
Replica autoscaling policy:       Fixed 2 replicas
Spot Policy:                      No spot fallback policy

Each replica will use the following resources (estimated):
Considered resources (1 node):
----------------------------------------------------------------------------------------------------------------------------------------------------
 CLOUD        INSTANCE          vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE                                                    COST ($)   CHOSEN  
----------------------------------------------------------------------------------------------------------------------------------------------------
 Kubernetes   8CPU--16GB--1L4   8       16        L4:1           gke_skycamp-skypilot-fastchat_us-central1-c_skycamp-gke-test   0.00          ✔    
 GCP          g2-standard-8     8       32        L4:1           us-east4-a                                                     0.85                
----------------------------------------------------------------------------------------------------------------------------------------------------
Launching a new service 'llm-service'. Proceed? [Y/n]:
Verifying bucket for storage skycamp24-finetune-f98d-0
Launching controller for 'llm-service'...
Considered resources (1 node):
--------------------------------------------------------------------------------------------------------------------------------------------------
 CLOUD        INSTANCE        vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE                                                    COST ($)   CHOSEN  
--------------------------------------------------------------------------------------------------------------------------------------------------
 Kubernetes   4CPU--4GB       4       4         -              gke_skycamp-skypilot-fastchat_us-central1-c_skycamp-gke-test   0.00          ✔    
 GCP          n2-standard-4   4       16        -              us-central1-a                                                  0.19                
--------------------------------------------------------------------------------------------------------------------------------------------------
⚙︎ Launching serve controller on Kubernetes.
└── Pod is up.
✓ Cluster launched: sky-serve-controller-9f92c97d.  View logs at: ~/sky_logs/sky-2024-10-23-00-41-06-916345/provision.log
⚙︎ Mounting files.
  Syncing (to 1 node): /tmp/service-task-llm-service-6vgt96ab -> ~/.sky/serve/llm_service/task.yaml.tmp
  Syncing (to 1 node): /tmp/tmpz9j3n79w -> ~/.sky/serve/llm_service/config.yaml
✓ Files synced.  View logs at: ~/sky_logs/sky-2024-10-23-00-41-06-916345/file_mounts.log
⚙︎ Running setup on serve controller.
  Check & install cloud dependencies on controller: done.                  
✓ Setup completed.  View logs at: ~/sky_logs/sky-2024-10-23-00-41-06-916345/setup-*.log
⚙︎ Service registered.

Service name: llm-service
Endpoint URL: [34.55.247.200:30001](http://34.55.247.200:30001/)
📋 Useful Commands
├── To check service status:    sky serve status llm-service [--endpoint]
├── To teardown the service:    sky serve down llm-service
├── To see replica logs:        sky serve logs llm-service [REPLICA_ID]
├── To see load balancer logs:  sky serve logs --load-balancer llm-service
├── To see controller logs:     sky serve logs --controller llm-service
├── To monitor the status:      watch -n10 sky serve status llm-service
└── To send a test request:     curl [34.55.247.200:30001](http://34.55.247.200:30001/)

✓ Service is spinning up and replicas will be ready shortly.
(base) root@49aaf5a031fc:/skycamp-tutorial/03_inferencing_and_serving# sky serve status llm-service
Services
NAME         VERSION  UPTIME  STATUS      REPLICAS  ENDPOINT            
llm-service  -        -       NO_REPLICA  0/2       [34.55.247.200:30001](http://34.55.247.200:30001/)  

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT  LAUNCHED  RESOURCES  STATUS   REGION  
llm-service   1   1        -         -         -          PENDING  -      
llm-service   2   1        -         -         -          PENDING  -      
(base) root@49aaf5a031fc:/skycamp-tutorial/03_inferencing_and_serving# sky serve status llm-service
Services
NAME         VERSION  UPTIME  STATUS      REPLICAS  ENDPOINT            
llm-service  -        -       NO_REPLICA  0/2       [34.55.247.200:30001](http://34.55.247.200:30001/)  

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT  LAUNCHED        RESOURCES          STATUS        REGION    
llm-service   1   1        -         a few secs ago  1x GCP({'L4': 1})  PROVISIONING  us-east4  
llm-service   2   1        -         a few secs ago  1x GCP({'L4': 1})  PROVISIONING  us-east4
skypilot-org / skypilot

[K8s] Optimizer still shows kubernetes as candidate when the cluster is all occupied #4154