toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.11k stars 174 forks source link

Multiple queue redis used instead of one #384

Closed mathilde-aubignac closed 1 year ago

mathilde-aubignac commented 1 year ago

PGSync version: 2.1.10

Postgres version: 13.5

Elasticsearch version: 7.13.2

Redis version: 7.0.4

Python version: 3.10.7 => All our environement are in distinct containers (es, redis, postgres and pgsync)

Schema : We have multiple schemas in a single config, because we need 4 different ES indices :

[
    {
        "database": "test",
        "index": "asset",

        "setting": {
            "mapping.total_fields.limit": 2000,
            "mapping.ignore_malformed": false
        },
        "nodes": {
            "table": "asset",
            [...]
        }
    },
    {
        "database": "test",
        "index": "search_history",

        "setting": {
            "mapping.total_fields.limit": 2000
        },

        "nodes": {
            "table": "search_history",
            "label": "search",
            [...]
        }
    },
    {
        "database": "test",
        "index": "post",

        "setting": {
            "mapping.total_fields.limit": 2000
        },

        "nodes": {
            "table": "post",
            [...]
        }
    },
    {
        "database": "test",
        "index": "room",

        "setting": {
            "mapping.total_fields.limit": 2000
        },

        "nodes": {
            "table": "room",
            [...]
        }
    }
]

Problem Description:

When we update the "asset" table in our database, and only the "asset" table, we have the updated data in the right indice on ES.

But when we take a closer look at the redis queue, we can see that most of changes are in a single queue, but fews are also used in other ones. image

When we look directly at Redis, we confirm that we see data that should be only in the queue test_asset also in others queue (in small amounts, but stills)

We are concerned about this behavior, especially because we hope to see which ES indices are updated, and with this issue we can't provide a reliable information.

Do you have any clue ?

Thank you for your help

toluaina commented 1 year ago

Actually the expected behaviour is to have different Redis queues for each node. I have added precise logging to show which queue would be updated. This is currently been tested in this branch