opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
74 stars 16 forks source link

Expose our Iceberg models as public data sets #2298

Open ryscheng opened 1 month ago

ryscheng commented 1 month ago

What is it?

In a BigQuery world, it was really easy to just expose all our models as a public data on Analytics Hub.

If we move to a world where more models run on sqlmesh+Trino, how can we expose this functionality?

Couple thoughts:

  1. Export parquet to Cloudflare R2 https://github.com/opensource-observer/oso/issues/919
  2. Expose parquet on GCS with consumer pays
  3. Copy data into BigQuery to expose in Analytics Hub as usual

I wonder how much storage costs, but I'm inclined towards #3 to avoid too much disruption.