long2ice / meilisync

Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch
https://github.com/long2ice/meilisync
Apache License 2.0
285 stars 43 forks source link

RDS DB replication slot not released by meilisync #84

Closed iurie-bogdanovici closed 6 months ago

iurie-bogdanovici commented 9 months ago

Hello, I configured meilisync with meilisearch and Postgres RDS DB. Everything works ok, except that the replication slot on RDS is growing continuosely. Each 5 minutes it will increase 64MB, which is about 18GB per day. If I understand correctly, meilisync should consume data from DB replication slot, and empty it, but this doesn't happen. I configured a maximum size value for the replication slot of 3GB (max_slot_wal_keep_size). When the replication slot reached this size it changed its status from active=true to active=false and meilisync app stopped syncing. Could you please tell if there are some additional configurations that should be made in order to solve this issue?

ifeltsweet commented 9 months ago

@long2ice Could it be because meilisync maintains its own progress instead of advancing progress position with pg_replication_slot_advance? I could not understand the reason for progress file/redis if Postgres provides this already.

iurie-bogdanovici commented 8 months ago

@long2ice Could you please at least confirm if meilisync is compatible/has been tested with RDS Postgres?

long2ice commented 8 months ago

Sorry I don't make fully test on postgres or rds postgres, I only use it in MySQL myself.

TEAMMEDIO commented 8 months ago

@Iurie-Bogdanovici,

while attempting to configure MeiliSync with MeiliSearch and a PostgreSQL RDS DB. I'm stuck on the step of enabling logical replication.

I've come to understand that to proceed, I need to enable the wal2json extension and change the wal_level parameter to logical. However, I've encountered a roadblock as RDS does not permit direct modification of _the wallevel parameter within the Parameter groups.

I'm aware that the wal2json extension is supported on platforms where PostgreSQL is installed, but I'm uncertain about the process of installing this extension on RDS! and making the necessary configuration changes.

Could you please provide guidance or share your experience on how to enable the wal2json extension and adjust the wal_level parameter to logical in a PostgreSQL RDS environment?

Your insights and assistance would be greatly appreciated.

iurie-bogdanovici commented 8 months ago

@TEAMMEDIO You should set rds.logical_replication parameter to 1.

These are the RDS parameters that I configured for the dev environment using Terraform: parameters = [ { name = "rds.force_ssl" value = "0" apply_method = "pending-reboot" }, { name = "rds.logical_replication" value = "1" apply_method = "pending-reboot" }, { name = "max_replication_slots" value = "20" apply_method = "pending-reboot" }, { name = "max_wal_senders" value = "20" apply_method = "pending-reboot" }, { name = "max_slot_wal_keep_size" value = "3000" apply_method = "immediate" } ]

TEAMMEDIO commented 8 months ago

Hi,

I wanted to express my gratitude for your previous response regarding my deployment of Meilisearch and Meilisync on EC2.

However, I must admit that I am quite disappointed as I've encountered a significant obstacle. While attempting to deploy Meilisearch and Meilisync, I encountered the following error: "TypeError: 'async for' requires an object with aiter method, got coroutine". This error persisted despite my attempts to deploy Meilisearch and Meilisync through two different methods:

  1. I created an EC2 instance using a Meilisearch image and attempted to run Meilisync via Docker.

  2. I created two Docker containers for Meilisearch and Meilisync within an Ubuntu EC2 instance.

Regrettably, both attempts yielded the same error. After a thorough investigation, I suspect that the issue may lie within the Postgres configuration prerequisites.

If you have successfully managed to deploy Meilisync, I would greatly appreciate it if you could provide me with detailed steps on how you achieved this. Any insights or guidance you could offer would be immensely helpful in resolving this issue.

Sincerely,

On Wed, Mar 6, 2024 at 8:44 AM Iurie Bogdanovici @.***> wrote:

@TEAMMEDIO https://github.com/TEAMMEDIO You should set rds.logical_replication parameter to 1.

These are the RDS parameters that I configured for the dev environment using Terraform: parameters = [ { name = "rds.force_ssl" value = "0" apply_method = "pending-reboot" }, { name = "rds.logical_replication" value = "1" apply_method = "pending-reboot" }, { name = "max_replication_slots" value = "20" apply_method = "pending-reboot" }, { name = "max_wal_senders" value = "20" apply_method = "pending-reboot" }, { name = "max_slot_wal_keep_size" value = "3000" apply_method = "immediate" } ]

— Reply to this email directly, view it on GitHub https://github.com/long2ice/meilisync/issues/84#issuecomment-1980187706, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBWLXJ5DB7XXGTPCLNBQ2L3YW23M7AVCNFSM6AAAAABC6SPBDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGE4DONZQGY . You are receiving this because you were mentioned.Message ID: @.***>

TEAMMEDIO commented 8 months ago

@iurie-bogdanovici, how did you overcome the installation of wal2json extension in RDS?

iurie-bogdanovici commented 8 months ago

@TEAMMEDIO I used only those parameters specified in my previous message, and then I configured meilisearch and meilisync using default settings from meilisearch website. When I run inside the db "SELECT * FROM pg_replication_slots" command I get in the column plugin, value "wal2json". So I suppose wal2json plugin is enabled by default. Regarding the error that you received on the mylisync side, try to use another meilisync docker image. Which image do you use? I also got some errors, but later I tried docker image "long2ice/meilisync:dev" and haven't got any errors since.

TEAMMEDIO commented 8 months ago

@ Iurie Bogdanovici, yes it is done for me I use " long2ice/meilisync:sha-5f6c2e0"

On Mon, Mar 18, 2024 at 6:05 PM Iurie Bogdanovici @.***> wrote:

@TEAMMEDIO https://github.com/TEAMMEDIO I used only those parameters specified in my previous message, and then I configured meilisearch and meilisync using default settings from meilisearch website. When I run inside the db "SELECT * FROM pg_replication_slots" command I get in the column plugin, value "wal2json". So I suppose wal2json plugin is enabled by default. Regarding the error that you received on the mylisync side, try to use another meilisync docker image. Which image do you use? I also got some errors, but later I tried docker image "long2ice/meilisync:dev" and haven't got any errors since.

— Reply to this email directly, view it on GitHub https://github.com/long2ice/meilisync/issues/84#issuecomment-2004330966, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBWLXJZB2PISANNSU5RPOWDYY4GE7AVCNFSM6AAAAABC6SPBDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUGMZTAOJWGY . You are receiving this because you were mentioned.Message ID: @.***>

iurie-bogdanovici commented 8 months ago

@TEAMMEDIO and you don't have issues with the RDS database replication slot growing continuousely?

TEAMMEDIO commented 8 months ago

@iurie-bogdanovici, sure, I got an error indicating that the slot has exceeded its maximum reserved size, rendering it unable to receive changes. While we're actively monitoring the slot's size for anomalies, we're considering implementing a temporary solution. This involves creating a cron job on the EC2 instance to shut down Meilisync, delete the replication slot from the database, and then restart Meilisync!!!

dzcpy commented 7 months ago

Anyone has got it fully working yet?

MattExact commented 6 months ago

I believe this is because meilisync is not confirming every processed message by calling send_feedback() method on the corresponding replication cursor. See the docs for consume_stream here where they warn:

Warning: When using replication with slots, failure to constantly consume and report success to the server appropriately can eventually lead to “disk full” condition on the server, because the server retains all the WAL segments that might be needed to stream the changes via all of the currently open replication slots.

iurie-bogdanovici commented 6 months ago

@MattExact Thanks. Now the issue is partially solved. Each time a message is processed the replication slot is consumed. The only issue is that if for example there is no activity for the RDS db (in our case is like this for dev env), the replication slot will grow. Is it possible to consume replication slot periodically (once an hour) if there is no activity? For example for another similar application called "pgsync" this is implemented as an environment variable REPLICATION_SLOT_CLEANUP_INTERVAL (controls the interval in which the replication slot is cleaned up. The default value is every hour).

MattExact commented 6 months ago

@MattExact Thanks. Now the issue is partially solved. Each time a message is processed the replication slot is consumed. The only issue is that if for example there is no activity for the RDS db (in our case is like this for dev env), the replication slot will grow.

@iurie-bogdanovici Ah that's interesting, from my understanding of the psycopg2 documentation, clearing at an interval is not necessary. But I wonder if the problem is the early return if there are no changes, meaning there could be many replication messages that are not being acknowledged on an idle database.

If we change https://github.com/long2ice/meilisync/blob/f837e60164db3ade3f7ff9e41db643ba47e5adda/meilisync/source/postgres.py#L91-L93

to something like the following (and fix the silly indentation error I made for msg.cursor.send_feedback) then I think it might fully solve the problem.

        changes = payload.get("change", [])
iurie-bogdanovici commented 6 months ago

@MattExact Big thanks to you. Now is working as expected, the replication slot on RDS is not increasing in size even if there is no activity on the db.