Open praveen-kd-23 opened 2 years ago
If there's a deadlock, some other thread holds the lock... what's the backtrace for that?
Thread 48 (Thread 0x7fe004c22700 (LWP 23820)):
#0 0x00007fe0a256429c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fe0a255d714 in __GI___pthread_mutex_lock (mutex=0x7fe0636347c8) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fe06375e251 in lws_mutex_refcount_lock () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#3 0x00007fe063796542 in lws_adopt_descriptor_vhost_via_info () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#4 0x00007fe06379650a in lws_adopt_descriptor_vhost () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#5 0x00007fe0637c34aa in rops_handle_POLLIN_listen () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#6 0x00007fe063790757 in lws_service_fd_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#7 0x00007fe06374ca85 in _lws_plat_service_forced_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#8 0x00007fe06374cedf in _lws_plat_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#9 0x00007fe063790a17 in lws_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
--Type <RET> for more, q to quit, c to continue without paging--
#10 0x00007fe08c03758f in lws_thread_service (id=<optimized out>) at mod_lws.c:333
#11 0x00007fe0a255afa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#12 0x00007fe0a248beff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 47 (Thread 0x7fe005423700 (LWP 23819)):
#0 0x00007fe0a256429c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fe0a255d714 in __GI___pthread_mutex_lock (mutex=0x7fe0636347c8) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fe06375e251 in lws_mutex_refcount_lock () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#3 0x00007fe0637880d4 in lws_close_free_wsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#4 0x00007fe0637907e2 in lws_service_fd_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#5 0x00007fe06374ca85 in _lws_plat_service_forced_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#6 0x00007fe06374cedf in _lws_plat_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#7 0x00007fe063790a17 in lws_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#8 0x00007fe08c03758f in lws_thread_service (id=<optimized out>) at mod_lws.c:333
#9 0x00007fe0a255afa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007fe0a248beff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 46 (Thread 0x7fe00741f700 (LWP 23818)):
#0 0x00007fe0a256429c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fe0a255d714 in __GI___pthread_mutex_lock (mutex=0x7fe0636347c8) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fe06375e251 in lws_mutex_refcount_lock () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#3 0x00007fe063796542 in lws_adopt_descriptor_vhost_via_info () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#4 0x00007fe06379650a in lws_adopt_descriptor_vhost () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#5 0x00007fe0637c34aa in rops_handle_POLLIN_listen () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#6 0x00007fe063790757 in lws_service_fd_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#7 0x00007fe06374ca85 in _lws_plat_service_forced_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#8 0x00007fe06374cedf in _lws_plat_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#9 0x00007fe063790a17 in lws_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#10 0x00007fe08c03758f in lws_thread_service (id=<optimized out>) at mod_lws.c:333
#11 0x00007fe0a255afa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#12 0x00007fe0a248beff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 45 (Thread 0x7fe007c20700 (LWP 23817)):
#0 0x00007fe0a256429c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fe0a255d714 in __GI___pthread_mutex_lock (mutex=0x7fe0636347c8) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fe06375e251 in lws_mutex_refcount_lock () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#3 0x00007fe0637880d4 in lws_close_free_wsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#4 0x00007fe0637907e2 in lws_service_fd_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#5 0x00007fe06374ca85 in _lws_plat_service_forced_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#6 0x00007fe06374cedf in _lws_plat_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#7 0x00007fe063790a17 in lws_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#8 0x00007fe08c03758f in lws_thread_service (id=<optimized out>) at mod_lws.c:333
#9 0x00007fe0a255afa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007fe0a248beff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 44 (Thread 0x7fe008421700 (LWP 23816)):
#0 0x00007fe0a256429c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fe0a255d714 in __GI___pthread_mutex_lock (mutex=0x7fe063630370) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fe06375e251 in lws_mutex_refcount_lock () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#3 0x00007fe06378ebe6 in lws_change_pollfd () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#4 0x00007fe063754460 in lws_tls_server_accept () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#5 0x00007fe063753079 in lws_server_socket_service_ssl () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#6 0x00007fe063796317 in lws_adopt_descriptor_vhost2 () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
--Type <RET> for more, q to quit, c to continue without paging--
#7 0x00007fe063796616 in lws_adopt_descriptor_vhost_via_info () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#8 0x00007fe06379650a in lws_adopt_descriptor_vhost () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#9 0x00007fe0637c34aa in rops_handle_POLLIN_listen () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#10 0x00007fe063790757 in lws_service_fd_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#11 0x00007fe06374ca85 in _lws_plat_service_forced_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#12 0x00007fe06374cedf in _lws_plat_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#13 0x00007fe063790a17 in lws_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#14 0x00007fe08c03758f in lws_thread_service (id=<optimized out>) at mod_lws.c:333
#15 0x00007fe0a255afa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#16 0x00007fe0a248beff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 43 (Thread 0x7fe008c22700 (LWP 23815)):
#0 0x00007fe0a256429c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fe0a255d714 in __GI___pthread_mutex_lock (mutex=0x7fe0636347c8) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fe06375e251 in lws_mutex_refcount_lock () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#3 0x00007fe063794d0f in lws_set_timeout () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#4 0x00007fe0637531d0 in lws_server_socket_service_ssl () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#5 0x00007fe0637ae89d in rops_handle_POLLIN_h1 () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#6 0x00007fe063790757 in lws_service_fd_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#7 0x00007fe06374ca85 in _lws_plat_service_forced_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#8 0x00007fe06374cedf in _lws_plat_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#9 0x00007fe063790a17 in lws_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#10 0x00007fe08c03758f in lws_thread_service (id=<optimized out>) at mod_lws.c:333
#11 0x00007fe0a255afa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#12 0x00007fe0a248beff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 42 (Thread 0x7fe009423700 (LWP 23814)):
#0 0x00007fe0a256429c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fe0a255d714 in __GI___pthread_mutex_lock (mutex=0x7fe0636347c8) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fe06375e251 in lws_mutex_refcount_lock () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#3 0x00007fe063794d0f in lws_set_timeout () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#4 0x00007fe0637531d0 in lws_server_socket_service_ssl () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#5 0x00007fe0637ae89d in rops_handle_POLLIN_h1 () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#6 0x00007fe063790757 in lws_service_fd_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#7 0x00007fe06374ca85 in _lws_plat_service_forced_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#8 0x00007fe06374cedf in _lws_plat_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#9 0x00007fe063790a17 in lws_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#10 0x00007fe08c03758f in lws_thread_service (id=<optimized out>) at mod_lws.c:333
#11 0x00007fe0a255afa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#12 0x00007fe0a248beff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 41 (Thread 0x7fe009c24700 (LWP 23813)):
#0 0x00007fe0a256429c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fe0a255d714 in __GI___pthread_mutex_lock (mutex=0x7fe0636347c8) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fe06375e251 in lws_mutex_refcount_lock () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#3 0x00007fe06374bf1e in lws_sul_plat_unix () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#4 0x00007fe063790d8f in __lws_sul_service_ripe () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#5 0x00007fe06374cc93 in _lws_plat_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#6 0x00007fe063790a17 in lws_service_tsi () at /usr/local/websocket/libwebsockets/lib/libwebsockets.so.19
#7 0x00007fe08c03758f in lws_thread_service (id=<optimized out>) at mod_lws.c:333
#8 0x00007fe0a255afa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#9 0x00007fe0a248beff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
I run 8 threads , dump of all the 8 threads added above
Hi , any update on this issue ?
Here is what I managed to get on main branch (e8eb7d6bd66dde2156dbcd37b0e9048ed4006320, line numbers might slightly differ - I was trying to add some logs first):
// pid=2038
#0 0x00007f05e6e7a1fd in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f05e6e73025 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f05cc54e4a2 in lws_mutex_refcount_lock (mr=0x7f05e9fdc370, reason=0x7f05cc5d9e40 <__func__.35035> "__lws_adopt_descriptor_vhost1") at /var/tmp/build/libwebsockets/lib/core/libwebsockets.c:1471
#3 0x00007f05cc588245 in __lws_adopt_descriptor_vhost1 (vh=0x55baa9ef21c0, type=7, vh_prot_name=0x0, parent=0x0, opaque=0x0, fi_wsi_name=0x0) at /var/tmp/build/libwebsockets/lib/core-net/adopt.c:161
-> lws_pt_lock(pt, __func__);
// pt = 0x7f05e9fdc348
// $9 = {lock = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 2025, Robust = No, Shared = No, Protocol = None}, lock_owner = 139645258422016, last_lock_reason = 0x7f05cc5cf610 <__func__.35974> "_lws_plat_service_tsi", lock_depth = 1 '\001', metadata = 0 '\000'}
#4 0x00007f05cc588d75 in lws_adopt_descriptor_vhost_via_info (info=0x7f01ab84fa70) at /var/tmp/build/libwebsockets/lib/core-net/adopt.c:525
-> lws_context_lock(info->vh->context, __func__);
// $10 = {lock = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 2038, Robust = No, Shared = No, Protocol = None}, lock_owner = 139645149316864, last_lock_reason = 0x7f05cc5d9ea0 <__func__.35069> "lws_adopt_descriptor_vhost_via_info", lock_depth = 1 '\001', metadata = 0 '\000'}
#5 0x00007f05cc588cde in lws_adopt_descriptor_vhost (vh=0x55baa9ef21c0, type=7, fd=..., vh_prot_name=0x0, parent=0x0) at /var/tmp/build/libwebsockets/lib/core-net/adopt.c:493
#6 0x00007f05cc5bd02c in rops_handle_POLLIN_listen (pt=0x7f05e9fddee8, wsi=0x55baa9efa4e0, pollfd=0x7f03a6d513a0) at /var/tmp/build/libwebsockets/lib/roles/listen/ops-listen.c:147
#7 0x00007f05cc582f21 in lws_service_fd_tsi (context=0x7f05e9fdc010, pollfd=0x7f03a6d513a0, tsi=13) at /var/tmp/build/libwebsockets/lib/core-net/service.c:771
#8 0x00007f05cc53bfdc in _lws_plat_service_forced_tsi (context=0x7f05e9fdc010, tsi=13) at /var/tmp/build/libwebsockets/lib/plat/unix/unix-service.c:51
#9 0x00007f05cc53c44d in _lws_plat_service_tsi (context=0x7f05e9fdc010, timeout_ms=2000000000, tsi=13) at /var/tmp/build/libwebsockets/lib/plat/unix/unix-service.c:216
#10 0x00007f05cc5831e1 in lws_service_tsi (context=0x7f05e9fdc010, timeout_ms=50, tsi=13) at /var/tmp/build/libwebsockets/lib/core-net/service.c:875
#11 <...>
#12 0x00007f05e91d942d in g_thread_proxy (data=0x55baa9f9f240) at ../glib/gthread.c:827
#13 0x00007f05e6e706db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#14 0x00007f05e6b9961f in clone () from /lib/x86_64-linux-gnu/libc.so.6
// pid=2025
#0 0x00007f05e6e7a1fd in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f05e6e73025 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f05cc54e4a2 in lws_mutex_refcount_lock (mr=0x7f05ea0612c8, reason=0x7f05cc5cf3ff "periodic checks") at /var/tmp/build/libwebsockets/lib/core/libwebsockets.c:1471
#3 0x00007f05cc53b424 in lws_sul_plat_unix (sul=0x7f05e9fdc490) at /var/tmp/build/libwebsockets/lib/plat/unix/unix-init.c:64
-> lws_context_lock(context, "periodic checks");
// $11 = {lock = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 2038, Robust = No, Shared = No, Protocol = None}, lock_owner = 139645149316864, last_lock_reason = 0x7f05cc5d9ea0 <__func__.35069> "lws_adopt_descriptor_vhost_via_info", lock_depth = 1 '\001', metadata = 0 '\000'}
#4 0x00007f05cc583559 in __lws_sul_service_ripe (own=0x7f05e9fdc400, own_len=2, usnow=13330620154025) at /var/tmp/build/libwebsockets/lib/core-net/sorted-usec-list.c:161
#5 0x00007f05cc53c1ea in _lws_plat_service_tsi (context=0x7f05e9fdc010, timeout_ms=2000000000, tsi=0) at /var/tmp/build/libwebsockets/lib/plat/unix/unix-service.c:125
-> lws_pt_lock(pt, __func__);
// pt = 0x7f05e9fdc348
// $12 = {lock = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 2025, Robust = No, Shared = No, Protocol = None}, lock_owner = 139645258422016, last_lock_reason = 0x7f05cc5cf610 <__func__.35974> "_lws_plat_service_tsi", lock_depth = 1 '\001', metadata = 0 '\000'}
#6 0x00007f05cc5831e1 in lws_service_tsi (context=0x7f05e9fdc010, timeout_ms=50, tsi=0) at /var/tmp/build/libwebsockets/lib/core-net/service.c:875
#7 <...>
#8 0x00007f05e91d942d in g_thread_proxy (data=0x55baa9f9cd20) at ../glib/gthread.c:827
#9 0x00007f05e6e706db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f05e6b9961f in clone () from /lib/x86_64-linux-gnu/libc.so.6
I also removed assertion assert(lws_socket_is_valid(pollfd->fd));
from service.c
and moved fd
check to wsi_from_fd
. Otherwise the assertion fired: looks like there is a race between lws_plat_insert_socket_into_fds
+ lws_plat_delete_socket_from_fds
and lws_service_fd_tsi
- fds are modified while holding pt
's lock but accessed without any locking (I'm not sure intentionally or not).
I have the same issue. Any update on this issue ?
@lws-team any update on this deadlock ?? Would be much appreciated
Well, it looks to me - 2y later - like it isn't going to get solved with the information provided. If there's a modified minimal app that can demonstrate it, that'd help. Deadlocks have usually been because of ordering problems, a lock that's usually taken before another in one case is taken after the other.
https://github.com/warmcat/libwebsockets/issues/2261#issue-849757598
Triggered 100 parallel connections and reconnected all the 100 connection every 5 seconds .
Same deadlock issue occuring again . Tried both main branch and v4.3.2 , Issue occuring in both builds .
Any help would be appreciable , as libwebsockets were extremely working good until this one .