redis always shows pending

mingchungchan commented 1 year ago

PGSync version: 2.3.3

Postgres version: PostgreSQL 13.2

Elasticsearch version: 7.17.0

Redis version: 5.0.6

Python version: 3.7

Problem Description: I have two nodes (A\ B) in the configuration of PGSync. At first I started PGSync and the data was synchronized. When the data change is heard in the subsequent monitoring, the first node A always stays in the waiting state of Redis, and the data cannot be synchronized to ES. The second node will not have such problems. Then I added the index name to the debug log printing, and found that after the node A was initialized and then obtain 1,000 data, it would not print the log anymore, and node B was still printing the log.

2022-11-03 02:39:17.205:DEBUG:pgsync.redisqueue: queue:A: bulk_pop size: 1000
............
2022-11-03 03:25:44.183:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:45.185:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:46.187:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:47.189:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:48.191:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:49.193:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
A: Sync postgres Xlog: [668] => Db: [1,155] => Redis: [total = 2,000 pending = 5,107] => Elastic: [226] ...
2022-11-03 03:25:50.196:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:51.198:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:52.200:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:53.203:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:54.205:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:55.207:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:56.209:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:57.212:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:58.214:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:25:59.216:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:00.218:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:01.221:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:02.223:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:03.225:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:04.228:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:04.228:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
B: Sync postgres Xlog: [987] => Db: [1,150] => Redis: [total = 1,150 pending = 0] => Elastic: [25] ...
2022-11-03 03:26:05.230:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:06.232:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:07.235:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:08.237:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:09.239:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:10.241:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:11.244:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:12.246:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:13.248:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:14.252:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:15.254:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:16.256:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:17.258:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:18.260:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
2022-11-03 03:26:19.262:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0
A: Sync postgres Xlog: [668] => Db: [1,155] => Redis: [total = 2,000 pending = 5,107] => Elastic: [226] ...
2022-11-03 03:26:20.264:DEBUG:pgsync.redisqueue: queue:B: bulk_pop size: 0

In addition, I thought it was too slow to query the data from Postgres. So I used the command to analyze the index. Then I added all the indexes. But this still does not work. pgsync -c schema.json -a

Thanks.

toluaina commented 1 year ago

First it is more efficient to run separate pgsync process per node
So maybe separate each node out into its own schema.
Does this still happen when you separate out the nodes into their own schema.
It might also be easier the find the root issue causing the pending items.

mingchungchan commented 1 year ago

I separated it into two nodes, and the delay was better than before, but it still happened occasionally. I don't know if it matters that I join too many tables in a node.

toluaina commented 1 year ago

very hard to tell without more details. can you try with USE_ASYNC=True and see if that makes any difference?

toluaina / pgsync

redis always shows pending #376