toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.15k stars 180 forks source link

Is it possible to create relationships with tables from different databases? #334

Open cassis163 opened 2 years ago

cassis163 commented 2 years ago

PGSync version: 2.3.2

Postgres version: 12

Elasticsearch version: 7.13.4

Redis version: 7.0.4

Python version: 3.10

Problem Description:

I want to import several tables from two databases and link them together with primary and foreign keys (mostly one-to-one relationships). The problem is that I do not really know how to do that. It is straightforward to create a relationship for 2 tables within the same database, but not so much for 2 tables that come from different databases.

cassis163 commented 2 years ago

Maybe this is better suited for 'discussions', but I am not sure.

toluaina commented 2 years ago
eichmann commented 2 years ago

You can use PostgreSQL's foreign schema mechanism: https://www.postgresql.org/docs/current/sql-importforeignschema.html This allows you to treat a schema even on a different machine as if it were part of your local database. Note that I've not (yet) tried this with pgsync... At worst you could wrap each foreign table with a local select * view.

cassis163 commented 2 years ago

I tried @eichmann's approach, but it seems that foreign tables are not supported.

Traceback (most recent call last):
  File "/home/casperaangeenbrug/.local/bin/bootstrap", line 70, in <module>
    main()
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/casperaangeenbrug/.local/bin/bootstrap", line 59, in main
    sync: Sync = Sync(
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/pgsync/sync.py", line 107, in __init__
    self.validate(repl_slots=repl_slots)
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/pgsync/sync.py", line 186, in validate
    self.root: Node = self.tree.build(self.nodes)
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/pgsync/node.py", line 278, in build
    node = Node(
  File "<string>", line 13, in __init__
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/pgsync/node.py", line 119, in __post_init__
    self.model: sa.sql.Alias = self.models(self.table, self.schema)
  File "/home/casperaangeenbrug/.local/lib/python3.10/site-packages/pgsync/base.py", line 174, in models
    raise TableNotFoundError(
pgsync.exc.TableNotFoundError: 'Table "stock.option" not found in registry'

Is it possible to build support for these @toluaina ?

My use-case is to decouple search functionality for microservices. I would like to use PGSync for streaming combined relational data from multiple microservices into an ElasticSearch index. In my case, there are two microservices that share a Product model for both stock and information management.

For now I can give each microservice a schema, so that I can use PGSync. This should be fine, because they would be decoupled while offering the ability to search on the data with ElasticSearch.

toluaina commented 1 year ago

I can add this to the TODO list. Although, I can't offer a time for completing this yet. This will also require a bit of investigation. For example I don't know how/if SQLAlchemy will reflect those tables.