toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.14k stars 180 forks source link

Support COUNT for children #311

Open rutkowskib opened 2 years ago

rutkowskib commented 2 years ago

PGSync version: 2.3.1

Problem Description:

I need to have a column created based on COUNT query of related table. I have 2 tables - users and users_followers. In users table there is info about users, and users_followers table stores who is followed by whom. They look something like this:

Users 
userId: uuid 
username: varchar
Users_followers
userId: uuid(users FK)
followerId: uuid(users FK)

I need to have information in ES about number of followers(I don't need information who it is or anything like that). It would look something like this:

userId: string
username: string
followers: integer 

I could achieve this probably using children in schema and then transforming data in plugin, but support out of the box would also be nice.

Thank you for your work on this project, and all help will be much appreciated.

bitofbreeze commented 2 years ago

I could also use this. I was thinking of just going ahead and indexing the actual followers and querying by the count in elastic search but that would be wasting some elastic search storage and probably has some overhead. Also I don't know if elastic search supports aggregate logic in queries

Looking at the source for how concat is implemented, I wonder if it might be not such a stretch to implement count https://github.com/toluaina/pgsync/blob/209181eb19ad9c99adf807907d8842d9e06cbb4c/pgsync/transform.py#L57

rutkowskib commented 2 years ago

I don't think this is correct place to implement what I want. If implemented as a transform it would still need fetching joined table to pgsync. I have already done that using a plugin(code below).

from pgsync import plugin

class FollowersPlugin(plugin.Plugin):
    name = 'Followers'

    def transform(self, doc, **kwargs):
        print(doc)
        if hasattr(doc['users_followers'], "__len__"):
            doc['followers'] = len(doc['users_followers'])
        else:
            doc['followers'] = 0
        del doc['users_followers']

        return doc

I would rather to do it on db side, because number of followers can grow very large, and then it would be very expensive to do join and fetch data to pgsync.

toluaina commented 2 years ago