shayonj / pg_easy_replicate

Easily setup logical replication and switchover to new database with minimal downtime
MIT License
948 stars 18 forks source link

Potential bug? Replication between AWS and Azure, timeout on creating subscription #180

Closed joetynan closed 2 weeks ago

joetynan commented 2 weeks ago

OK, not sure how to resolve this, I think it may be due to the app timing out before there's enough time for the subscription to connect to the publication.

Source server in AWS, target server in Azure. Both servers have the IP of their opposite party whitelisted. Config_check comes back as green. Bootstrap appears to be successful, and I see that schema appears to be copying over OK.

I verified that the special user role being used for both servers is 'rds_superuser' - I created that role on the Azure server and made it a member of azure_pg_admin, which is the equivalent for rds_superuser in AWS. It has the ability to create/modify roles and tables, etc.

When I go to start sync, the publication appears to be created normally on the AWS side. However, when it completes that and goes to create the subscription on the Azure target, I get this error:

Subscription creation failed, please ensure both databases are in the same network region: PG::QueryCanceled: ERROR: canceling statement due to statement timeout

I tried using the env var PG_EASY_REPLICATE_STATEMENT_TIMEOUT="10s" in the CLI (I'm using the docker image), but that doesn't appear to resolve the timeout issue.

When I manually create the subscription and put in the connection string information via pgAdmin, it is able to see the publication, and after about 20seconds successfully creates the subscription and I can see that the LSN is changing when I refresh.

shayonj commented 2 weeks ago

Thats very interesting, does bumping PG_EASY_REPLICATE_STATEMENT_TIMEOUT to say 30s or above help ? I have seen some setups needing longer, I am not sure why.

Also, whats the user when you are creating the subscription from pgAdmin, is it the same user that is part of the connection string for pg_easy_replicate?

joetynan commented 2 weeks ago

Ah! Ok, slight update - when I remove quotes from the value that appears to populate, but the final step in the subscription creation still shows only the default statement_timeout of 5s. The user I'm using to create the subscription manually is the same one in the SOURCE_DB_URL that was used to create the bootstrap schema.

joetynan commented 2 weeks ago

{"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":30,"time":"2024-11-12T23:19:30.421+00:00","v":0,"msg":"(0.066451s) ALTER PUBLICATION \"pger_publication_test\"\n ADD TABLE \"versions\"","version":"0.3.3"} {"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":30,"time":"2024-11-12T23:19:30.489+00:00","v":0,"msg":"(0.068101s) COMMIT","version":"0.3.3"} {"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":30,"time":"2024-11-12T23:19:30.489+00:00","v":0,"msg":"Setting up subscription","publication_name":"pger_publication_test","subscription_name":"pger_subscription_test","version":"0.3.3"} {"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":30,"time":"2024-11-12T23:19:31.020+00:00","v":0,"msg":"(0.062950s) SET standard_conforming_strings = ON","version":"0.3.3"} {"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":30,"time":"2024-11-12T23:19:31.083+00:00","v":0,"msg":"(0.062579s) SET client_min_messages = 'WARNING'","version":"0.3.3"} {"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":30,"time":"2024-11-12T23:19:31.146+00:00","v":0,"msg":"(0.062613s) SET DateStyle = 'ISO'","version":"0.3.3"} {"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":20,"time":"2024-11-12T23:19:31.146+00:00","v":0,"msg":"Connection established","version":"0.3.3"} {"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":30,"time":"2024-11-12T23:19:31.209+00:00","v":0,"msg":"(0.062708s) SET statement_timeout to '5s'","version":"0.3.3"} {"name":"pg_easy_replicate","hostname":"8191145780bd","pid":1,"level":50,"time":"2024-11-12T23:19:36.272+00:00","v":0,"msg":"PG::QueryCanceled: ERROR: canceling statement due to statement timeout: CREATE SUBSCRIPTION \"pger_subscription_test\"\n CONNECTION 'postgres://user:password@server:5432/database'\n PUBLICATION \"pger_publication_test\"","version":"0.3.3"} Subscription creation failed, please ensure both databases are in the same network region: PG::QueryCanceled: ERROR: canceling statement due to statement timeout

shayonj commented 2 weeks ago

Thanks for the find! Looks like the create subscription was not using the config variable. I pushed a fix, curious if that helped?

https://github.com/shayonj/pg_easy_replicate/releases/tag/v0.3.4

joetynan commented 2 weeks ago

slight update - definitely now is updating that timeout value! Still timing out for me, that may be due to just not having extended out the timeout value enough. I'll be experimenting to see if I set it for some arbitrarily large time, if that allows it to connect. I did notice that even when manually creating the subscription, it can take some time for it to finally complete creating it in pgAdmin.

shayonj commented 2 weeks ago

Thats so odd, usually its pretty snappy. Did you try the string w/o quotes, did that help?

joetynan commented 2 weeks ago

yup, did it without quotes, that seems to successfully change the value, which is good. this could be due to it being cross-cloud, I'll try extending the timeout to something silly like 120s, see if it still fails.

joetynan commented 2 weeks ago

ah! OK, yeah - just needs a longer timeout value since it's cross-cloud connection. I was able to successfully get it to

shayonj commented 2 weeks ago

ah! perfect. Glad you got it sorted. Do you think, this is good to close then?

I am a bit unclear if the quote is something that is coming from this tool or azure/pgadmin

joetynan commented 2 weeks ago

Yup! Closing this out. pgAdmin is just a GUI postgres tool, for folks like me still getting familiar with the query commands of postgres, it's a bit faster sometimes to navigate the structure via that then through psql.