opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
48 stars 13 forks source link

Trino for distributed queries #821

Open ryscheng opened 4 months ago

ryscheng commented 4 months ago

What is it?

In the future, we may have data spread out among a bunch of places (e.g. BigQuery, Clickhouse, Postgres, random files, IPFS). Trino seems like an interesting option for running distributed queries https://trino.io/

ryscheng commented 2 months ago

Looking at the docs, this is pretty interesting, you can setup data connectors to run

Then join it all together in a unified interface. Will be useful if our data is actually across a bunch of locations

davidgasquez commented 2 months ago

In case it helps, Starbust is probably the best managed offering!

ryscheng commented 2 months ago

Apparently you can run Trino on GCP DataProc! that surprised me https://cloud.google.com/dataproc/docs/tutorials/trino-dataproc

ryscheng commented 1 month ago

For reference, dbt-trino is useful if we want to replace BQ in our data pipeline https://github.com/starburstdata/dbt-trino