Open digoal opened 4 years ago
[root@iZbp135pwcjjoxqgfpw9k1Z ~]# pstack 41065
#0 0x00007f5441b9ae43 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x000000000075888e in WaitEventSetWait ()
#2 0x0000000000758ce9 in WaitLatchOrSocket ()
#3 0x00000000007701df in ProcSleep ()
#4 0x0000000000764cef in WaitOnLock ()
#5 0x0000000000766076 in LockAcquireExtended ()
#6 0x0000000000763f46 in LockPage ()
#7 0x000000000052d0e3 in zsundo_trim ()
#8 0x000000000052e32e in zsundo_get_oldest_undo_ptr ()
#9 0x0000000000520b95 in zsbt_tid_begin_scan ()
#10 0x000000000052a54f in zedstoream_fetch_row ()
#11 0x000000000052a9c1 in zedstoream_index_fetch_tuple ()
#12 0x00000000004d29b2 in index_fetch_heap ()
#13 0x00000000004d2a1b in index_getnext_slot ()
#14 0x000000000062cd2b in check_exclusion_or_unique_constraint ()
#15 0x000000000062d73e in ExecCheckIndexConstraints ()
#16 0x0000000000653de6 in ExecInsert ()
#17 0x0000000000655219 in ExecModifyTable ()
#18 0x000000000062e092 in standard_ExecutorRun ()
#19 0x000000000077c02a in ProcessQuery ()
#20 0x000000000077c258 in PortalRunMulti ()
#21 0x000000000077cc6d in PortalRun ()
#22 0x000000000077a902 in PostgresMain ()
#23 0x0000000000482278 in ServerLoop ()
#24 0x0000000000709d63 in PostmasterMain ()
#25 0x0000000000482ebe in main ()
[root@iZbp135pwcjjoxqgfpw9k1Z ~]# pstack 41063
#0 0x00007f544279aafb in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1 0x00007f544279ab8f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007f544279ac2b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3 0x00000000006f7f52 in PGSemaphoreLock ()
#4 0x0000000000769a5c in LWLockAcquire ()
#5 0x000000000051c8c3 in zsbt_descend ()
#6 0x0000000000521143 in zsbt_tid_multi_insert ()
#7 0x0000000000528010 in zedstoream_insert_speculative ()
#8 0x000000000065417f in ExecInsert ()
#9 0x0000000000655219 in ExecModifyTable ()
#10 0x000000000062e092 in standard_ExecutorRun ()
#11 0x000000000077c02a in ProcessQuery ()
#12 0x000000000077c258 in PortalRunMulti ()
#13 0x000000000077cc6d in PortalRun ()
#14 0x000000000077a902 in PostgresMain ()
#15 0x0000000000482278 in ServerLoop ()
#16 0x0000000000709d63 in PostmasterMain ()
#17 0x0000000000482ebe in main ()
Thank you for the report and for trying out Zedstore! I haven't had the chance to try out your reproduction yet. Do you see process 41065 or 41063 hanging indefinitely?
Also, what commit of Zedstore did you try this on?
From the stack trace it seems that they are both waiting on the meta page. The second stack trace (41063) seems to have a frame missing perhaps due to compiler optimizations? (there is no direct call to LWLockAcquire()
inside zsbt_descend()), I would suspect that the missing frame is zsmeta_get_root_for_attribute()
where there is an attempt to lock the meta page, which the other process is trying to lock (41065).
Having encountered scenarios in the code where we are trying to acquire a lock previously held, it is highly possible that one of the two backends is trying to acquire a lock that it already has on the meta page. A LWLockHeldByMe()
call can help see that.
thank you, this commit i used: https://github.com/greenplum-db/postgres/commit/7b24beac75de36c0968459764fc0558e60304811 environment is aliyun's ECS CentOS 7.7 x64
HI, there is some testing for zedstore, when i use concurrent insert into on conflict test, here is some buffer_content locks and qps=0;
br, digoal