project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.48k stars 2.01k forks source link

[TC-CNET-4.21] all-cluster-app --thread ScanNetworks crash #20974

Open samadDotDev opened 2 years ago

samadDotDev commented 2 years ago

When running all-cluster-app with --thread flag of TH 1.3 image on RPi, there is a segmentation fault on ScanNetworks command. Note that this doesn't crash if only --wifi is enabled and returns the scan result correctly. Is there any thread configuration that needs to be set on Test Harness for these manual tests?

sudo ./all-cluster-app --trace_file /tmp/trace_file.log --trace_decode 1 --wifi --thread
[1658277109.452261][2502:2502] CHIP:DMG: << from UDP:[fd08:a83b:271a:cffd:5067:db6e:8af3:eb3e%eth0]:5540 | 92453569 | [Secure Channel  (0) / Standalone Ack (0x10) / Session = 10358 / Exchange = 36812]
[1658277109.452384][2502:2502] CHIP:DMG: Header Flags =
[1658277109.452444][2502:2502] CHIP:DMG: {
[1658277109.452533][2502:2502] CHIP:DMG:     Exchange (0x03) =
[1658277109.452588][2502:2502] CHIP:DMG:     {
[1658277109.452636][2502:2502] CHIP:DMG:         Initiator = true
[1658277109.452693][2502:2502] CHIP:DMG:         AckMsg = 224955736
[1658277109.452747][2502:2502] CHIP:DMG:     }
[1658277109.452817][2502:2502] CHIP:DMG: }
[1658277109.452871][2502:2502] CHIP:DMG:
[1658277109.452938][2502:2502] CHIP:DMG: Encrypted Payload (59 bytes) =
[1658277109.452993][2502:2502] CHIP:DMG: {
[1658277109.453036][2502:2502] CHIP:DMG:     data = 00762800c2ba8205b604c28a811c8c80ed7459c51a4b1507e40f57f1f689f21b107651fba8eebb440db4cab42e414726d70ed21e802946550d2e0e
[1658277109.453081][2502:2502] CHIP:DMG:     buffer_ptr = 187650583722896
[1658277109.453122][2502:2502] CHIP:DMG: }
[1658277109.453162][2502:2502] CHIP:DMG:
[1658277109.453250][2502:2502] CHIP:DMG: Additional Fields =
[1658277109.453306][2502:2502] CHIP:DMG: {
[1658277109.453380][2502:2502] CHIP:DMG:     peer_address = UDP:[fd08:a83b:271a:cffd:5067:db6e:8af3:eb3e%eth0]:5540
[1658277109.453438][2502:2502] CHIP:DMG: }
[1658277109.453488][2502:2502] CHIP:DMG:
[1658277109.454214][2502:2502] CHIP:EM: Received message of type 0x8 with protocolId (0, 1) and MessageCounter:92453570 on exchange 36813r
[1658277109.454379][2502:2502] CHIP:EM: Handling via exchange: 36813r, Delegate: 0xaaaac47d3ed0
[1658277109.454517][2502:2502] CHIP:DMG: InvokeRequestMessage =
[1658277109.454580][2502:2502] CHIP:DMG: {
[1658277109.454637][2502:2502] CHIP:DMG:    suppressResponse = false,
[1658277109.454701][2502:2502] CHIP:DMG:    timedRequest = false,
[1658277109.454762][2502:2502] CHIP:DMG:    InvokeRequests =
[1658277109.454835][2502:2502] CHIP:DMG:    [
[1658277109.454895][2502:2502] CHIP:DMG:        CommandDataIB =
[1658277109.454962][2502:2502] CHIP:DMG:        {
[1658277109.455025][2502:2502] CHIP:DMG:            CommandPathIB =
[1658277109.455095][2502:2502] CHIP:DMG:            {
[1658277109.455172][2502:2502] CHIP:DMG:                EndpointId = 0x0,
[1658277109.455262][2502:2502] CHIP:DMG:                ClusterId = 0x31,
[1658277109.455342][2502:2502] CHIP:DMG:                CommandId = 0x0,
[1658277109.455411][2502:2502] CHIP:DMG:            },
[1658277109.455488][2502:2502] CHIP:DMG:
[1658277109.455551][2502:2502] CHIP:DMG:            CommandFields =
[1658277109.455624][2502:2502] CHIP:DMG:            {
[1658277109.455698][2502:2502] CHIP:DMG:            },
[1658277109.455768][2502:2502] CHIP:DMG:        },
[1658277109.455839][2502:2502] CHIP:DMG:
[1658277109.455897][2502:2502] CHIP:DMG:    ],
[1658277109.455966][2502:2502] CHIP:DMG:
[1658277109.456025][2502:2502] CHIP:DMG:    InteractionModelRevision = 1
[1658277109.456082][2502:2502] CHIP:DMG: },
[1658277109.456225][2502:2502] CHIP:DMG: AccessControl: checking f=1 a=c s=0x983CEF80E47639BB t=0x00000001 c=0x0000_0031 e=0 p=a
[1658277109.456304][2502:2502] CHIP:DMG: AccessControl: allowed
[1658277109.456370][2502:2502] CHIP:DMG: Received command for Endpoint=0 Cluster=0x0000_0031 Command=0x0000_0000
[1658277109.457304][2502:2502] CHIP:DMG: Decreasing reference count for CommandHandler, remaining 1
[1658277109.459819][2502:2505] CHIP:DL: Failed to perform finish Thread network scan: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name io.openthread.BorderRouter.wpan0 was not provided by any .service files
Segmentation fault
bzbarsky-apple commented 2 years ago

@samadDotDev If you run under a debugger, where is the crash happening?

samadDotDev commented 2 years ago

@bzbarsky-apple, here is the backtrace:

Thread 4 "all-cluster-app" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xfffff62bf040 (LWP 3221)]
0x0000fffff7a42a5c in ?? () from /lib/aarch64-linux-gnu/libglib-2.0.so.0
(gdb) bt
#0  0x0000fffff7a42a5c in  () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
#1  0x0000fffff79b3d60 in g_bit_lock () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
#2  0x0000fffff7a31290 in g_variant_n_children () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
#3  0x0000aaaaaac90db8 in chip::DeviceLayer::ThreadStackManagerImpl::_OnNetworkScanFinished(_GAsyncResult*) (this=0xaaaaaadc69e0 <chip::DeviceLayer::ThreadStackManagerImpl::sInstance>, res=0xffffe8003e20)
    at ../../third_party/connectedhomeip/src/platform/Linux/ThreadStackManagerImpl.cpp:609
#4  0x0000aaaaaac90a4c in chip::DeviceLayer::ThreadStackManagerImpl::_OnNetworkScanFinished(_GObject*, _GAsyncResult*, void*)
    (source_object=0xaaaaaae12130, res=0xffffe8003e20, user_data=0xaaaaaadc69e0 <chip::DeviceLayer::ThreadStackManagerImpl::sInstance>) at ../../third_party/connectedhomeip/src/platform/Linux/ThreadStackManagerImpl.cpp:581
#5  0x0000fffff7c04ad8 in  () at /lib/aarch64-linux-gnu/libgio-2.0.so.0
#6  0x0000fffff7c04d10 in  () at /lib/aarch64-linux-gnu/libgio-2.0.so.0
#7  0x0000fffff7c72b48 in  () at /lib/aarch64-linux-gnu/libgio-2.0.so.0
#8  0x0000fffff7c04ad8 in  () at /lib/aarch64-linux-gnu/libgio-2.0.so.0
#9  0x0000fffff7c04d10 in  () at /lib/aarch64-linux-gnu/libgio-2.0.so.0
#10 0x0000fffff7c61ef8 in  () at /lib/aarch64-linux-gnu/libgio-2.0.so.0
#11 0x0000fffff7c04ad8 in  () at /lib/aarch64-linux-gnu/libgio-2.0.so.0
#12 0x0000fffff7c04b24 in  () at /lib/aarch64-linux-gnu/libgio-2.0.so.0
#13 0x0000fffff79ed19c in g_main_context_dispatch () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
#14 0x0000fffff7a41cdc in  () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
#15 0x0000fffff79ec87c in g_main_loop_run () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
#16 0x0000aaaaaac81a7c in chip::DeviceLayer::(anonymous namespace)::GDBus_Thread() () at ../../third_party/connectedhomeip/src/platform/Linux/PlatformManagerImpl.cpp:66
#17 0x0000aaaaaac83a04 in std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) (__f=@0xffffe8006cd8: 0xaaaaaac81a5c <chip::DeviceLayer::(anonymous namespace)::GDBus_Thread()>) at /usr/include/c++/11/bits/invoke.h:61
#18 0x0000aaaaaac839a4 in std::__invoke<void (*)()>(void (*&&)()) (__fn=@0xffffe8006cd8: 0xaaaaaac81a5c <chip::DeviceLayer::(anonymous namespace)::GDBus_Thread()>) at /usr/include/c++/11/bits/invoke.h:96
#19 0x0000aaaaaac83940 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0xffffe8006cd8) at /usr/include/c++/11/bits/std_thread.h:253
#20 0x0000aaaaaac83914 in std::thread::_Invoker<std::tuple<void (*)()> >::operator()() (this=0xffffe8006cd8) at /usr/include/c++/11/bits/std_thread.h:260
#21 0x0000aaaaaac838f4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() (this=0xffffe8006cd0) at /usr/include/c++/11/bits/std_thread.h:211
#22 0x0000fffff784888c in  () at /lib/aarch64-linux-gnu/libstdc++.so.6
#23 0x0000fffff75b669c in start_thread (arg=<optimized out>) at pthread_create.c:435
#24 0x0000fffff761ed1c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
bzbarsky-apple commented 2 years ago

@samadDotDev thank you, that is very helpful!

That's crashing under this line of code:

    if (g_variant_n_children(scan_result.get()) > 0)

in glib. @erjiaqing could we have scan_result null here? For example if openthread_io_openthread_border_router_call_scan_finish returned false. In that case it looks like we schedule an OnFinished callback but still try to use the scan result?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 1 year ago

This stale issue has been automatically closed. Thank you for your contributions.