skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.49k stars 463 forks source link

[Catalog] GCP catalog fetcher is broken #1629

Closed WoosukKwon closed 1 year ago

WoosukKwon commented 1 year ago

Recently, GCP has significantly changed the frontend implementation of the website https://cloud.google.com/compute/vm-instance-pricing, from which our catalog fetcher crawls and parses the VM pricing data. We need a new crawler to handle this change.

WoosukKwon commented 1 year ago

One possible option is to use https://github.com/Cyclenerd/google-cloud-pricing-cost-calculator. The repository provides and periodically updates the GCP VM information (including the prices) and images.

WoosukKwon commented 1 year ago

One possible option is to use https://github.com/Cyclenerd/google-cloud-pricing-cost-calculator. The repository provides and periodically updates the GCP VM information (including the prices) and images.

While the files in the repository are quite easy to parse, they do not contain spot prices.

infwinston commented 1 year ago

Is calling GCP Catalog API an option? https://cloud.google.com/blog/topics/cost-management/introducing-cloud-billing-catalog-api-gcp-pricing-in-real-time

WoosukKwon commented 1 year ago

Is calling GCP Catalog API an option? https://cloud.google.com/blog/topics/cost-management/introducing-cloud-billing-catalog-api-gcp-pricing-in-real-time

Yeah I guess it's essentially what the above repository is doing. Previously, I didn't use the API because parsing the API outputs seemed to be super difficult. However, maybe we can now refer to the repo and use the APIs by ourselves.

infwinston commented 1 year ago

Not sure if this is useful but I tried to call their API to query prices with the following snippet based on this doc

from oauth2client.client import GoogleCredentials
from googleapiclient import discovery
cb = discovery.build("cloudbilling", "v1")
servs = cb.services().list().execute()
for serv in servs['services']:
  if serv["displayName"] == "Compute Engine":
    print(serv)
> {'name': 'services/6F81-5844-456A', 'serviceId': '6F81-5844-456A', 'displayName': 'Compute Engine', 'businessEntityName': 'businessEntities/GCP'}
res = cb.services().skus().list(parent='services/6F81-5844-456A').execute()
print(len(res['skus']))
> 5000
print(res['skus'][0])
> {'name': 'services/6F81-5844-456A/skus/000F-E31B-1D6F', 'skuId': '000F-E31B-1D6F', 'description': 'N1 Predefined Instance Ram running in Zurich', 'category': {'serviceDisplayName': 'Compute Engine', 'resourceFamily': 'Compute', 'resourceGroup': 'N1Standard', 'usageType': 'OnDemand'}, 'serviceRegions': ['europe-west6'], 'pricingInfo': [{'summary': '', 'pricingExpression': {'usageUnit': 'GiBy.h', 'displayQuantity': 1, 'tieredRates': [{'startUsageAmount': 0, 'unitPrice': {'currencyCode': 'USD', 'units': '0', 'nanos': 5928000}}], 'usageUnitDescription': 'gibibyte hour', 'baseUnit': 'By.s', 'baseUnitDescription': 'byte second', 'baseUnitConversionFactor': 3865470566400}, 'currencyConversionRate': 1, 'effectiveTime': '2023-01-25T18:12:18.218Z'}], 'serviceProviderName': 'Google', 'geoTaxonomy': {'type': 'REGIONAL', 'regions': ['europe-west6']}}

But looks like there are still some works to map such info to a clean catalog..

infwinston commented 1 year ago

btw, this website can be useful https://cloud.google.com/skus/?currency=USD&filter=N1+Predefined+Instance+Core+running+in+Americas

infwinston commented 1 year ago

For example, the price of the instance n1-standard-1 is

0.031611 (N1 Predefined Instance Core running in Americas)
+ 0.004237 (N1 Predefined Instance Ram running in Americas) * 3.75 GB
= 0.04749975

which matches what's showing on https://cloud.google.com/compute/vm-instance-pricing#n1_predefined