toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.11k stars 174 forks source link

redis always shows pending #376

Open mingchungchan opened 1 year ago

mingchungchan commented 1 year ago

PGSync version: 2.3.3

Postgres version: PostgreSQL 13.2

Elasticsearch version: 7.17.0

Redis version: 5.0.6

Python version: 3.7

Problem Description: I have two nodes (A\ B) in the configuration of PGSync. At first I started PGSync and the data was synchronized. When the data change is heard in the subsequent monitoring, the first node A always stays in the waiting state of Redis, and the data cannot be synchronized to ES. The second node will not have such problems. Then I added the index name to the debug log printing, and found that after the node A was initialized and then obtain 1,000 data, it would not print the log anymore, and node B was still printing the log.

2022-11-03 02:39:17.205:DEBUG:pgsync.redisqueue: queue:A: bulk_pop size: 1000
............
2022-11-03 03:25:44.183:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:45.185:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:46.187:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:47.189:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:48.191:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:49.193:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
A: Sync postgres Xlog: [668] => Db: [1,155] => Redis: [total = 2,000 pending = 5,107] => Elastic: [226] ...
2022-11-03 03:25:50.196:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:51.198:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:52.200:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:53.203:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:54.205:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:55.207:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:56.209:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:57.212:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:58.214:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:59.216:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:00.218:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:01.221:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:02.223:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:03.225:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:04.228:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:04.228:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
B: Sync postgres Xlog: [987] => Db: [1,150] => Redis: [total = 1,150 pending = 0] => Elastic: [25] ...
2022-11-03 03:26:05.230:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:06.232:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:07.235:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:08.237:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:09.239:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:10.241:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:11.244:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:12.246:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:13.248:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:14.252:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:15.254:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:16.256:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:17.258:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:18.260:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:19.262:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
A: Sync postgres Xlog: [668] => Db: [1,155] => Redis: [total = 2,000 pending = 5,107] => Elastic: [226] ...
2022-11-03 03:26:20.264:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0

In addition, I thought it was too slow to query the data from Postgres. So I used the command to analyze the index. Then I added all the indexes. But this still does not work. pgsync -c schema.json -a

Thanks.

toluaina commented 1 year ago
mingchungchan commented 1 year ago

I separated it into two nodes, and the delay was better than before, but it still happened occasionally. I don't know if it matters that I join too many tables in a node.

toluaina commented 1 year ago

very hard to tell without more details. can you try with USE_ASYNC=True and see if that makes any difference?