toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.16k stars 181 forks source link

Plugin transform does not remove documents #359

Closed jeremykhon closed 1 year ago

jeremykhon commented 1 year ago

PGSync version: Master

Postgres version: 14

Elasticsearch version: 7.11.0

Redis version: 7.0.4

Python version: 3.10.4

Problem Description:

I noticed transform using the plugin is only used during the initial sync of the data when it pulls from the DB, and it doesn't get called during subsequent syncs when running it as a daemon and pulling payloads from Redis

Is this normal? Are there any plans to develop this?

I would like to do something like SELECT * FROM users WHERE role = 'admin' and if a user changes from 'admin' to something else I would like for it to be removed from the index. Currently it doesn't do this

Or more simply, using the example from your docs, I would like for fullname to be added when the payload is received from Redis, not only when it pulls from the DB

toluaina commented 1 year ago

Plugin is called for each doc. Can you share more details on your config? Also make sure the plugin PATH is setup correctly. Some details on how to do this are here

jeremykhon commented 1 year ago

Hi, thanks for coming back to me. Yes you're right I just tested it again and it does get called for each doc, however it is unable to handle deletions from the index it seems

i.e I have a simple schema like this

[
  {
    "database": "db",
    "index": "users",
    "plugins": ["filter_users"],
    "nodes": {
      "table": "users",
      "schema": "public",
      "primary_key": ["id"],
      "columns": [
        "id",
        "role"
      ],
      "children": []
    }
  }
]

and a plugin like this

class FilterUsers(plugin.Plugin):
    name = "filter_users"

    # Filters out admins
    def transform(self, doc, **kwargs):
        if doc.get("role") != "user":
            return None
        return doc

If i run an update query that changes a user.role from user to admin then the document will not be removed from the index and be stale forever. Is there any plans to handle deletions using plugins?

manishasodekar commented 1 year ago
class FilterUsers(plugin.Plugin):
    name = "filter_users"

    # Filters out admins
    def transform(self, doc, **kwargs):
        if doc.get("role") == "user":
            return doc

with above approach you should be able to index document role = 'users'...other documents won't get indexed.

toluaina commented 1 year ago

Did this resolve the issue?

toluaina commented 1 year ago
jeremykhon commented 1 year ago

Hi sorry for the late response, the solution above doesn't remove users if they later become admins. it doesn't resolve the issue but its helpful to know thats the expected behaviour. It's an easy workaround to just query for non admins! Thanks all!