opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
72 stars 16 forks source link

Replace BigQuery with Lakehouse/Icehouse #1209

Open ryscheng opened 7 months ago

ryscheng commented 7 months ago

What is it?

https://delta.io/

Why? https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf

Happy to stick with BigQuery public data sets for now until this becomes a stronger need.

ryscheng commented 2 months ago

We can also consider Apache Iceberg instead of Delta Lake.

This would be for data upstream from the events table.

We should try to preserve the benefits we get currently from BigQuery:

ryscheng commented 1 month ago

Ever since we solved https://github.com/opensource-observer/oso/issues/821

It's an open question now whether we should move more of our datapipeline to sqlmesh + Trino + Iceberg, instead of dbt + BigQuery. This issue can track that work