nicholasyager / dbt-loom

A dbt-core plugin to weave together multi-project dbt-core deployments
The Unlicense
114 stars 18 forks source link

Support postgres cross-database project dependencies #44

Closed minelytica closed 5 months ago

minelytica commented 6 months ago

I am proposing a feature for dbt-loom to work with multiple projects where each has a postgres databases or any databases built on PG. Like Hydra DB.

My use case is that I am building a Data Product segregated data mesh. Each product has its own PG database, usually hosted on the same PG server - but that could change based on scale. Each Data Product has its own DBT-core dbt project. Rather than replicate all the models using Extract-load between the db's it makes more sense to be able to read (read-only) the data from each db. And I faced some challenges getting dbt-loom working with this, but I did get it working. Thanks! Dbt-loom is a really useful solution.

Let say I have a dataproduct "calendar" and one called "workday". Workday refers to Calendar models. Dbt loom succesfully generated the following SQL

select * from "calendar"."product"."days"

However, in PG this gives an error on execution as PG does not allow cross db joins.

postgres cross-database references are not implemented

So the solution I came do involved

  1. some minor "tweaks" to init.py
  2. using the PG IMPORT FOREIGN SCHEMA to create an set of alias style tables. For example inside "workdays" db (this is an abbreviated version)

`(create extension and grant usage - removed for brevity)

CREATE SERVER IF NOT EXISTS foreign_calendar FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5432', dbname 'calendar');

(also create user mapping - removed for brevity)

create schema if not exists calendar;

CREATE FOREIGN TABLE calendar.days ( id text , col1 date , col2 numeric , col3 text , col4 text ) SERVER foreign_calendar OPTIONS (schema_name 'product', table_name 'days');

-- this now works to read the foreign database and table select * from "workhours"."calendar"."days" `

Describe the solution you'd like Obviously the CREATE FOREIGN TABLE scaffolding could be automated, for each PUBLIC dbt model - and even the CREATE SERVER etc could be run each initialisation - and pulled from the connection details of the project.

Each time a model changes and the table DDL changes, you need to add/update/remove columns from the FOREIGN TABLE which could be automated.

Describe alternatives you've considered It is workable to have to write the above code for every model, but not ideal.

Additional context I would like feedback if this makes sense as a feature of DBT loom or should be implemented as a separate plugin? I am new to DBT, but not new to SQL and programming - I may be able to attempt this, but any starting pointers or code reviews would be most welcome if its a wanted feature.

nicholasyager commented 5 months ago

Hi @minelytica! Thank you for taking the time to put together this request. It sounds like you're looking to automatically enable the creation of foreign data wrappers to connect different postgres databases. Given that this is specific to postgres, and would require extra SQL compilation and execution, I believe that this request would be better suited for a dbt adapter.

Here's a useful dbt-core issue thread that discusses approaches further: https://github.com/dbt-labs/dbt-core/issues/2182