skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.48k stars 462 forks source link

Can Sky prioritize regions with quota available? #664

Closed infwinston closed 12 months ago

infwinston commented 2 years ago

Kevin's question: I asked for V100 but Sky kept spending minutes on regions that I don't have quota. Is there any way to specify regions for Sky to prioritize? (my translation: sky launch is too slow)

infwinston commented 2 years ago

Two possible solutions: 1) Sky looks up quotas for each cloud and prioritize those regions/clouds with available quota. 2) Sky lets users to specify desirable regions.

For 1) I survey a couple commands that can be useful for Sky to query quotas.

> gcloud alpha services quota list --service=compute.googleapis.com --consumer=projects/intercloud-320520 --format json

{
  "defaultLimit": "8",
  "dimensions": {
    "region": "us-west1"
  },
  "effectiveLimit": "8"
}

> aws service-quotas list-service-quotas --service-code ec2 --region us-east-1 --query "Quotas[*].{ServiceName:ServiceName,QuotaName:QuotaName,QuotaCode:QuotaCode,Value:Value}" --output json

{
    "ServiceName": "Amazon Elastic Compute Cloud (Amazon EC2)",
    "QuotaName": "Running On-Demand P instances",
    "QuotaCode": "L-417A185B",
    "Value": 128.0
},

> az vm list-usage --location "East US" -o table

Standard NCSv3 Family vCPUs               0               192
Standard NDSv2 Family vCPUs               0               40

each command takes 3~6s. seems reasonable.

gmittal commented 2 years ago

This may also be helpful: https://github.com/brennerm/aws-quota-checker

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

infwinston commented 1 year ago

Being resolved by https://github.com/skypilot-org/skypilot/pull/1953!

github-actions[bot] commented 12 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Michaelvll commented 12 months ago

This should be resolved by #2187 #2187. Closing this issue now. Will file another issue for more fine-grained quota check.