toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.16k stars 180 forks source link

Sync issue on child tables #569

Open gitbute opened 1 month ago

gitbute commented 1 month ago

PGSync version: 3.2.0

Postgres version: 16.4.0

Elasticsearch/OpenSearch version: 7.16.3

Redis version: 7.4.0

Python version: 3.9

Problem Description:

Possibly related to #568, maybe also #552

Problem Description:

Take this schema:

[
  {
    "database": "db",
    "index": "document",
    "nodes": {
      "table": "document",
      "columns": [
        "title"
      ],
      "children": [
        {
          "table": "content",
          "columns": ["content"],
          "relationship": {
            "variant": "object",
            "type": "one_to_one",
            "foreign_key": {
              "child": ["document_id"],
              "parent": ["uuid"]
            }
          }
        }
      ]
    }
  },
  {
    "database": "db",
    "index": "content",
    "nodes": {
      "table": "content",
      "columns": [
        "content"
      ]
    }
  }
]

Lets prepopulate the document table with 20 documents and no content, and let pgsync index. While pgsync is down, i populate the content table with 20 rows. After pgsync starts, only one document in index "document" has content, while in index "content" i have 20 objects. So pgsync in my opinion seems to have a more general syncing issue related to child tables.

Error Message (if any):

toluaina commented 1 month ago
toluaina commented 1 month ago
gitbute commented 1 month ago
  • I also see you are manually specifying the foreign_key between the document and content.

I tried manually specifying foreign keys but also automatically, the issue stays the same

  • I'm guessing this means Content.document_id is of type UUID

Correct

  • Document.uuid is the primary key and of type UUID?

Yes, also correct

  • There is no foreign key defined on content relation for document_id?

Foreign key is defined in database relation, (content.document_id) -> (document.uuid)

  • Also, does the same happen if you exclude the 2nd node in the schema with index content

Yes, it happens aswell

toluaina commented 4 weeks ago

This has now been resolved on the main branch. Can you please try it out now?

gitbute commented 4 weeks ago

Thanks so much for working on this issue. Sadly, in my testing, the change seemed to make no difference. Im currently trying to debug three issues with pgsync all relating to child relations:

We probably will be looking for another indexing solution for now, but will reconsider if those issues are fixed. I dont understand the pgsync code enough to help with it.