toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.2k stars 181 forks source link

Index Children only with based Condition #514

Open kundankumarcasio opened 11 months ago

kundankumarcasio commented 11 months ago

PGSync version: latest

Postgres version: 15.2

Elasticsearch/OpenSearch version: 7.17.13

Redis version: 7.2

Python version: 3.8

Problem Description: I want to index children with given condition [ { "database": "databaseName", "index": "indexName", "nodes": { "table": "parentTable", "columns": [], "children": [ { "table": "childTable", "columns": [], "relationship": { "variant": "object", "type": "one_to_many", "foreign_key": { "child": [ "parent_id" ], "parent": [ "id" ], "condition": { "child_column1": "some Value" //need help here } } } } ] } } ]

Equivalent SQL Query: select * from parentTable join childTable where parentTable.id=childTable.parent_id where child_column1="some Value"

Error Message (if any):

jvanderen1 commented 11 months ago

I wonder if this could also be achieved within PGSync's plugin API? That way, conditions can be more expressive. Something like:

from pgsync import plugin

class FooPlugin(plugin.Plugin):
    name = 'FooPlugin'

    def transform(self, doc, **kwargs):
       ...

    def should_index(self, doc, **kwargs):
        return doc['childTable']['parent_id'] == 'someValue'
kundankumarcasio commented 10 months ago

thank you very much. i will try this.

jvanderen1 commented 10 months ago

I don't think an API exists today for something like this. However, it could be an enhancement.

kundankumarcasio commented 10 months ago

yes. Any work around for this ?

jvanderen1 commented 10 months ago

You can probably write a plugin like the following:

from pgsync import plugin

class FooPlugin(plugin.Plugin):
    name = 'FooPlugin'

    def transform(self, doc, **kwargs):
      return doc if doc['childTable']['parent_id'] == 'someValue' else None

But I am concerned that depending on the condition, this won't necessarily insert/delete documents as expected. It may work for your use case though.

kundankumarcasio commented 10 months ago

I tried this and it worked, but this creates some sync inconsistencies.

jvanderen1 commented 9 months ago

This would be a great enhancement to add for things like soft-deleted or archived records.

toluaina commented 5 months ago

I think Plugins are the right/only way to do this. Can you elaborate on what you mean by sync inconsistencies? I can assist with a plugin if you need.

jvanderen1 commented 5 months ago

@toluaina I believe if you were to have indexed a PostgreSQL record at one point, but then a plugin used were to return None based on some condition, the correlating document would not necessarily get removed from the index.