Closed yetyongjin closed 5 months ago
What I don't understand is that when the lws_service_tsi function was originally called, the tsi parameter passed was 3. Why did it jump to 0 later? Isn't tsi a thread index?
SMP mode has only been tested with server... it also requires internal locking to be added inside lws for all points where the service threads might contend on managing lists etc.
It doesn't mean that suddenly lws is threadsafe, it's still using single threaded event loops just n of them. APIs can only be called from the thread that created the wsi. This can be managed inside lws for server, which controls on which service thread the accepted wsi is created, but on client it needs the user code to manage it.
I don't think it will work well for client, sorry.
I adjusted the compilation method and got a more complete report. From the report, the problem occurs when the connection timeout. At this moment wsi is released, but sul does not seem to be cancelled. @lws-team
================================================================= ==7==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a001e230a0 at pc 0x7f5833c77c5f bp 0x7f580c8ada60 sp 0x7f580c8ada50 READ of size 8 at 0x61a001e230a0 thread T102 (ZCC_LWS_0)
#1 0x7f5833bdf882 in _lws_plat_service_tsi /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:125
#2 0x7f5833c77491 in lws_service_tsi /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/service.c:870
#3 0x7f5832fa55ef in zcc_lws_client::lws_thread_fun(int) /tmp/GIT/rtts/freeswitch-1.6.19/libs/zcc_lws/zcc_lws_client.cpp:302
#4 0x7f5832fa58e4 in void std::__invoke_impl<void, void (zcc_lws_client::*)(int), zcc_lws_client*, int>(std::__invoke_memfun_deref, void (zcc_lws_client::*&&)(int), zcc_lws_client*&&, int&&) /usr/include/c++/8/bits/invoke.h:73
#5 0x7f5832fa58e4 in std::__invoke_result<void (zcc_lws_client::*)(int), zcc_lws_client*, int>::type std::__invoke<void (zcc_lws_client::*)(int), zcc_lws_client*, int>(void (zcc_lws_client::*&&)(int), zcc_lws_client*&&, int&&) /usr/include/c++/8/bits/invoke.h:95
#6 0x7f5832fa58e4 in decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)(), (_S_declval<2ul>)())) std::thread::_Invoker<std::tuple<void (zcc_lws_client::*)(int), zcc_lws_client*, int> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/include/c++/8/thread:244
#7 0x7f5832fa58e4 in std::thread::_Invoker<std::tuple<void (zcc_lws_client::*)(int), zcc_lws_client*, int> >::operator()() /usr/include/c++/8/thread:253
#8 0x7f5832fa58e4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (zcc_lws_client::*)(int), zcc_lws_client*, int> > >::_M_run() /usr/include/c++/8/thread:196
#9 0x7f5832cccb22 (/lib64/libstdc++.so.6+0xc2b22)
#10 0x7f58344521c9 in start_thread (/lib64/libpthread.so.0+0x81c9)
#11 0x7f58322e4e72 in __clone (/lib64/libc.so.6+0x39e72)
0x61a001e230a0 is located 544 bytes inside of 1192-byte region [0x61a001e22e80,0x61a001e23328) freed by thread T102 (ZCC_LWS_0) here:
#1 0x7f5833bff27d in _realloc /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core/alloc.c:144
#2 0x7f5833bff2b0 in lws_realloc /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core/alloc.c:154
#3 0x7f5833c51008 in __lws_free_wsi /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/close.c:267
#4 0x7f5833c5708e in __lws_close_free_wsi_final /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/close.c:1003
#5 0x7f5833c55e54 in __lws_close_free_wsi /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/close.c:870
#6 0x7f5833c5726b in lws_close_free_wsi /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/close.c:1017
#7 0x7f5833c9d56d in lws_client_connect_3_connect /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/client/connect3.c:731
#8 0x7f5833c9915c in lws_client_conn_wait_timeout /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/client/connect3.c:39
#9 0x7f5833c77e66 in __lws_sul_service_ripe /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/sorted-usec-list.c:161
#10 0x7f5833bdf882 in _lws_plat_service_tsi /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:125
#11 0x7f5833c77491 in lws_service_tsi /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/service.c:870
#12 0x7f5832fa55ef in zcc_lws_client::lws_thread_fun(int) /tmp/GIT/rtts/freeswitch-1.6.19/libs/zcc_lws/zcc_lws_client.cpp:302
previously allocated by thread T108 (ZCC_LWS_6) here:
#1 0x7f5833bff220 in _realloc /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core/alloc.c:133
#2 0x7f5833bff2db in lws_zalloc /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core/alloc.c:159
#3 0x7f5833c7aa0f in __lws_wsi_create_with_role /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/wsi.c:304
#4 0x7f5833c93864 in lws_client_connect_via_info /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/client/connect.c:161
#5 0x7f5832fa3a03 in zcc_lws_connection::try_connect(lws_context*) /tmp/GIT/rtts/freeswitch-1.6.19/libs/zcc_lws/zcc_lws_connection.cpp:120
Thread T102 (ZCC_LWS_0) created by T98 (rtts_lws_thread) here:
#1 0x7f5832ccce08 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib64/libstdc++.so.6+0xc2e08)
Thread T98 (rtts_lws_thread) created by T16 (fs_timer_worker) here:
Thread T16 (fs_timer_worker) created by T14 here:
#1 0x55f8234286ff in apr_thread_create (/opt/ssb/bin/freeswitch+0x286ff)
Thread T14 created by T0 here:
#1 0x55f8234286ff in apr_thread_create (/opt/ssb/bin/freeswitch+0x286ff)
Thread T108 (ZCC_LWS_6) created by T98 (rtts_lws_thread) here:
#1 0x7f5832ccce08 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib64/libstdc++.so.6+0xc2e08)
SUMMARY: AddressSanitizer: heap-use-after-free /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/sorted-usec-list.c:137 in __lws_sul_service_ripe Shadow bytes around the buggy address: 0x0c34803bc5c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c34803bc5d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c34803bc5e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c34803bc5f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c34803bc600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c34803bc610: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd 0x0c34803bc620: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c34803bc630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c34803bc640: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c34803bc650: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c34803bc660: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==7==ABORTING
(gdb) f 6
137 in /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/sorted-usec-list.c
(gdb) p sul
$8 = (lws_sorted_usec_list_t ) 0x61a001e23088
(gdb) p sul
$9 = {list = {prev = 0x0, next = 0x0, owner = 0x0}, us = 0, cb = 0x7f5833c990d7
There's a helper for tracking down zombie suls that will find them earlier, -DLWS_WITH_SUL_DEBUGGING=1
on cmake for lws will go through all existing suls at object deletion time checking if any belong in the object still.
Great, let me try it.
Now my gateway can operate at 30 CPS, 3000 concurrent calls, and tested a total of 1 million calls. Performance is excellent. I modified a little client-related code. Can it be submitted? If so, what do I need to do? @lws-team
Great... basically just paste git diff
from your lws dir here is enough, I will study it.
If you didn't already you probably want to build for -DCMAKE_BUILD_TYPE=RELEASE
and -DLWS_WITH_SUL_DEBUGGING=0
since they both slow things down.
It is base on tag v4.3.3. @lws-team
diff --git a/lib/core-net/client/connect3.c b/lib/core-net/client/connect3.c index a9d2e9e0..1ab5b178 100644 --- a/lib/core-net/client/connect3.c +++ b/lib/core-net/client/connect3.c @@ -217,7 +217,7 @@ lws_client_connect_3_connect(struct lws wsi, const char ads, */
lwsi_set_state(wsi, LRS_UNCONNECTED);
lws_sul_schedule(wsi->a.context, 0, &wsi->sul_connect_timeout,
lws_sul_schedule(wsi->a.context, wsi->tsi, &wsi->sul_connect_timeout, lws_client_dns_retry_timeout, LWS_USEC_PER_SEC); return wsi; @@ -601,7 +601,7 @@ ads_known:
lws_sul_schedule(wsi->a.context, 0, &wsi->sul_connect_timeout,
lws_sul_schedule(wsi->a.context, wsi->tsi, &wsi->sul_connect_timeout, lws_client_conn_wait_timeout, wsi->a.context->timeout_secs LWS_USEC_PER_SEC); @@ -728,6 +728,7 @@ try_next_dns_result: lws_inform_client_conn_fail(wsi, (void )cce, strlen(cce));
failed1:
lws_sul_cancel(&wsi->sul_connect_timeout); lws_close_free_wsi(wsi, LWS_CLOSE_STATUS_NOSTATUS, "client_connect3");
return NULL; diff --git a/lib/core-net/client/connect4.c b/lib/core-net/client/connect4.c index c34d2253..6d6e0588 100644 --- a/lib/core-net/client/connect4.c +++ b/lib/core-net/client/connect4.c @@ -279,7 +279,8 @@ send_hs: pfd.revents = LWS_POLLOUT;
lwsl_wsi_info(wsi, "going to service fd");
n = lws_service_fd(wsi->a.context, &pfd);
//n = lws_service_fd(wsi->a.context, &pfd);
n = lws_service_fd_tsi(wsi->a.context, &pfd, wsi->tsi); if (n < 0) { cce = "first service failed"; goto failed; @@ -318,7 +319,8 @@ provoke_service: pfd.events = LWS_POLLIN; pfd.revents = LWS_POLLIN;
n = lws_service_fd(wsi->a.context, &pfd);
//n = lws_service_fd(wsi->a.context, &pfd);
n = lws_service_fd_tsi(wsi->a.context, &pfd, wsi->tsi); if (n < 0) { cce = "first service failed"; goto failed; diff --git a/lib/core-net/pollfd.c b/lib/core-net/pollfd.c index dea85aec..749bf673 100644 --- a/lib/core-net/pollfd.c +++ b/lib/core-net/pollfd.c @@ -543,7 +543,7 @@ lws_callback_on_writable(struct lws *wsi) return -1; }
if (__lws_change_pollfd(w, 0, LWS_POLLOUT))
if (lws_change_pollfd(w, 0, LWS_POLLOUT)) return -1;
return 1; diff --git a/lib/core/logs.c b/lib/core/logs.c index 27b7b9fa..8bb607b8 100644 --- a/lib/core/logs.c +++ b/lib/core/logs.c @@ -119,7 +119,9 @@ __lws_lc_tag(struct lws_context context, lws_lifecycle_group_t grp, }
lc->us_creation = (uint64_t)lws_now_usecs();
lws_context_lock(context, func); lws_dll2_add_tail(&lc->list, &grp->owner);
lws_context_unlock(context);
lwsl_refcount_cx(lc->log_cx, 1);
@@ -188,7 +190,9 @@ __lws_lc_untag(struct lws_context context, lws_lifecycle_t lc) (int)lc->list.owner->count - 1, buf);
lws_context_unlock(context);
lwsl_refcount_cx(lc->log_cx, -1); }
gitdiff.txt Attach git diff output.
@lws-team , I have created a pull request for it. https://github.com/warmcat/libwebsockets/pull/3109/commits
I have a websockets gateway. The LWS was used as a client. When I changed the lws service thread model to multi-threading, libasan reported an error during the performance test. Here is the report of libasan.
================================================================= ==7==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a001877490 at pc 0x7f282911bfb8 bp 0x7f27ff270d10 sp 0x7f27ff270d00 READ of size 8 at 0x61a001877490 thread T104 (ZCC_LWS_3)
0 0x7f282911bfb7 in __lws_change_pollfd /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/pollfd.c:479
0x61a001877490 is located 16 bytes inside of 1192-byte region [0x61a001877480,0x61a001877928) freed by thread T101 (ZCC_LWS_0) here:
0 0x7f282c60f7f0 in __interceptor_free (/opt/ssb/lib/libasan.so.5+0xef7f0)
previously allocated by thread T104 (ZCC_LWS_3) here:
0 0x7f282c60fff8 in __interceptor_realloc (/opt/ssb/lib/libasan.so.5+0xefff8)
Thread T104 (ZCC_LWS_3) created by T98 (rtts_lws_thread) here:
0 0x7f282c572eb3 in __interceptor_pthread_create (/opt/ssb/lib/libasan.so.5+0x52eb3)
Thread T98 (rtts_lws_thread) created by T16 (fs_timer_worker) here:
0 0x7f282c572eb3 in __interceptor_pthread_create (/opt/ssb/lib/libasan.so.5+0x52eb3)
Thread T16 (fs_timer_worker) created by T14 here:
0 0x7f282c572eb3 in __interceptor_pthread_create (/opt/ssb/lib/libasan.so.5+0x52eb3)
Thread T14 created by T0 here:
0 0x7f282c572eb3 in __interceptor_pthread_create (/opt/ssb/lib/libasan.so.5+0x52eb3)
Thread T101 (ZCC_LWS_0) created by T98 (rtts_lws_thread) here:
0 0x7f282c572eb3 in __interceptor_pthread_create (/opt/ssb/lib/libasan.so.5+0x52eb3)
SUMMARY: AddressSanitizer: heap-use-after-free /tmp/GIT/rtts/freeswitch-1.6.19/libs/libwebsockets-4.3.3/lib/core-net/pollfd.c:479 in __lws_change_pollfd Shadow bytes around the buggy address: 0x0c3480306e40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c3480306e50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c3480306e60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c3480306e70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c3480306e80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x0c3480306e90: fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c3480306ea0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c3480306eb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c3480306ec0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c3480306ed0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c3480306ee0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==7==ABORTING