toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.14k stars 180 forks source link

parent/child and self-referenced structure #89

Open DanielZavacky opened 3 years ago

DanielZavacky commented 3 years ago

PGSync version: 1.1.28

Postgres version: 13.1

Elasticsearch version: 7.10.1

Redis version: 6.2.1

Python version: 3.7

Problem Description:

Hi. I'm currently trying to work with pgsync and I'm facing two problems and I haven't found a way to solve them. Could you, please, tell me if it is even possible to do the following?

  1. I'm trying to define a schema that would map the parent/child association, where parent can have multiple children, child can have multiple children, and so on ... I have database table like this: parent_child Is there a way to create schema, which would generate something like this: parent_child_result_exp

  2. I'm trying to define a schema that would map self-referenced association, where item can have array of alternative items I have a database like this: self_ref Is there a way to create schema, which would generate something like this: self_ref_data

toluaina commented 3 years ago

Yes you can do 1 and 2.

For 1) you can have a look at the books example here

For 2) you just need to define a relationship of type one_to_many

DanielZavacky commented 3 years ago

I can't get 1. scenario (parent/child) working. I have this schema and this DB table: schema_parent_child, but it will create this documents in elasticsearch: elastic_parent_child

  1. scenario is resolved by foreign_key property
toluaina commented 3 years ago

I'll be honest self-referential relationships have not been fully explored. Let alone self-referential user-defined foreign key relationships.

Are you sure this shouldn't be

"foreign_key": {
      "child": ["id"],
      "parent": ["parent_id"]
 }

In any case, I feel a self-referencing relationship should be handled as a special case. The complexity for unbounded queries will need to be resolved with CTE's

DanielZavacky commented 3 years ago

Yes, I tried both

"foreign_key": {
      "child": ["id"],
      "parent": ["parent_id"]
 }

and

"foreign_key": {
      "child": ["parent_id"],
      "parent": ["id"]
 }

and neither worked as expected.

Yes, recursive CTE is probably the only way to do it. If I could be helpful in any way, just let me know

shambarick commented 3 years ago

Any workaround? I have the same issue.

mmbfreitas commented 1 year ago

Same problem here

asturm-fe commented 8 months ago

The problem is still there... I have tried to sync a test-database schema with books {id, title} as the root object and a relation relatedbooks {parentid, childid} to connect book entries (e.g. from a book series). The foreign keys for both parentid and childid are set to the books(id) field in the relation relatedbooks.

The pgsync-schema looks like this:

[
    {
        "database": "postgres",
        "index": "test_index",
        "nodes": {
            "table": "books",
            "schema": "test",
            "columns": [
                "title",
                "id"
            ],
            "children": [
                {
                    "table": "books",
                    "label": "related_books",
                    "schema": "test",
                    "columns": [
                        "title",
                        "id"
                    ],
                    "relationship": {
                        "type": "one_to_many",
                        "variant": "object",
                        "through_tables": [
                            "relatedbooks"
                        ]
                    }
                 }
            ]
    }
]

In the resulting documents in Opensearch, the field related_books is always empty . PGSync does'nt seem to be able to connect the books table with itself through the relatedbooks table.

When I copy the books table into a second table book_1 and update the foreign key in related_books accordingly, it works (but copying is no solution for the real data).

A related problem is that I cannot model the relation in both directions this way, because PGSync handles the connection only from schema-parent to -child but what I would like is to have all directions indicated by the foreign keys to be handled.