project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.32k stars 1.96k forks source link

[BUG] thread network provisioning crash in cirque test #28352

Open yunhanw-google opened 1 year ago

yunhanw-google commented 1 year ago

Reproduction steps

Recently we disable the cirque test when upgrading docker image location in https://github.com/project-chip/connectedhomeip/pull/27879, cirque test is used to gate the quality for network provisioning cluster, when re-enabling, we observe thread border router crash in network scanning.

Controller side DEBUG:chip.EM:Flushed pending ack for MessageCounter:97987663 on exchange 34853i INFO:NetworkCommissioning:Received response: NetworkCommissioning.Commands.ScanNetworksResponse(networkingStatus=<NetworkCommissioningStatusEnum.kUnknownError: 12>, debugText=None, wiFiScanResults=None, threadScanResults=[]) 2023-07-28 03:49:57 5edf5df264cd PythonMatterControllerTEST[4359] CRITICAL Testfail: Failed to finish network commissioning CRITICAL:PythonMatterControllerTEST:Testfail: Failed to finish network commissioning.

device side:

otbr-agent[55]: 00:03:19.760 [I] Mac-----------: Sending Beacon Request

otbr-agent[55]: 00:03:20.062 [I] Mac-----------: Sending Beacon Request

[1690516197.466393][60:64] CHIP:DL: Failed to perform finish Thread network scan: Method “io.openthread.BorderRouter.Scan” returned type “(a(tstayqqynyybb))”, but expected “(a(tstayqqyyyybb))” [1690516197.466535][60:60] CHIP:DMG: Command handler moving to [ Preparing] [1690516197.466618][60:60] CHIP:DMG: Command handler moving to [AddingComm] [1690516197.466665][60:60] CHIP:DMG: Command handler moving to [AddedComma] [1690516197.466674][60:60] CHIP:DMG: Decreasing reference count for CommandHandler, remaining 0 [1690516197.466752][60:60] CHIP:EM: <<< [E:34853r S:45418 M:97987663 (Ack:221481980)] (S) Msg TX to 1:000000000001B669 [18AD] --- Type 0001:09 (IM:InvokeCommandResponse) [1690516197.466788][60:60] CHIP:IN: (S) Sending msg 97987663 on secure session with LSID: 45418 [1690516197.466865][60:60] CHIP:DMG: Command handler moving to [CommandSen] [1690516197.466879][60:60] CHIP:DMG: Command handler moving to [AwaitingDe]

Thread 2 "gmain-matter" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff71dd700 (LWP 64)] 0x00007ffff79813f0 in g_bit_lock () from /lib/x86_64-linux-gnu/libglib-2.0.so.0

Thread 5 (Thread 0x7fffe7fff700 (LWP 67)):

0 0x00007ffff7fb1b34 in open64 () at /lib/x86_64-linux-gnu/libpthread.so.0

1 0x00005555555a777e in NamedPipeCommands::EventCommandListenerTask(void*) (arg=0x555555897260 <(anonymous namespace)::sChipNamedPipeCommands>) at ../../examples/platform/linux/NamedPipeCommands.cpp:75

2 0x00007ffff7fa6609 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0

3 0x00007ffff7557133 in clone () at /lib/x86_64-linux-gnu/libc.so.6

Thread 4 (Thread 0x7ffff51db700 (LWP 66)):

0 0x00007ffff754a99f in poll () at /lib/x86_64-linux-gnu/libc.so.6

1 0x00007ffff79af36e in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

2 0x00007ffff79af6f3 in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

3 0x00007ffff7c05f8a in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

4 0x00007ffff79d8ae1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

5 0x00007ffff7fa6609 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0

6 0x00007ffff7557133 in clone () at /lib/x86_64-linux-gnu/libc.so.6

Thread 3 (Thread 0x7ffff61dc700 (LWP 65)):

0 0x00007ffff754a99f in poll () at /lib/x86_64-linux-gnu/libc.so.6

1 0x00007ffff79af36e in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

2 0x00007ffff79af4a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

3 0x00007ffff79af4f1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

4 0x00007ffff79d8ae1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

5 0x00007ffff7fa6609 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0

6 0x00007ffff7557133 in clone () at /lib/x86_64-linux-gnu/libc.so.6

Thread 2 (Thread 0x7ffff71dd700 (LWP 64)):

0 0x00007ffff79813f0 in g_bit_lock () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

1 0x00007ffff79ef2dd in g_variant_n_children () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

2 0x000055555577a01d in chip::DeviceLayer::ThreadStackManagerImpl::_OnNetworkScanFinished(_GAsyncResult*) (this=0x55555589e660 , res=0x7ffff0022830) at ../../src/platform/Linux/ThreadStackManagerImpl.cpp:672

3 0x0000555555779c45 in chip::DeviceLayer::ThreadStackManagerImpl::_OnNetworkScanFinished(_GObject, _GAsyncResult, void*) (source_object=0x7ffff000e8b0, res=0x7ffff0022830, user_data=0x55555589e660 ) at ../../src/platform/Linux/ThreadStackManagerImpl.cpp:644

4 0x00007ffff7b9bf49 in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

5 0x00007ffff7b9cb3d in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

6 0x00007ffff7c0239d in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

7 0x00007ffff7b9bf49 in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

8 0x00007ffff7b9cb3d in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

9 0x00007ffff7bf73a4 in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

10 0x00007ffff7b9bf49 in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

11 0x00007ffff7b9bf8d in () at /lib/x86_64-linux-gnu/libgio-2.0.so.0

12 0x00007ffff79af04e in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

13 0x00007ffff79af400 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

14 0x00007ffff79af6f3 in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

15 0x0000555555769e16 in chip::DeviceLayer::(anonymous namespace)::GLibMainLoopThread(void*) (userData=0x5555558c2de0) at ../../src/platform/Linux/PlatformManagerImpl.cpp:65

16 0x00007ffff79d8ae1 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0

17 0x00007ffff7fa6609 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0

18 0x00007ffff7557133 in clone () at /lib/x86_64-linux-gnu/libc.so.6

Bug prevalence

always

GitHub hash of the SDK that was being used

none

Platform

other

Platform Version(s)

No response

Anything else?

No response

JuliaPageMorseMicro commented 7 months ago

I saw a crash in a similar location due to read() returning zero into readBytes and so readBuf[-1] being written. Ensure nothing is done if readBytes is 0.

$ git diff examples/platform/linux/NamedPipeCommands.cpp
diff --git a/examples/platform/linux/NamedPipeCommands.cpp b/examples/platform/linux/NamedPipeCommands.cpp
index 9ec28b0121..ac2bf31aa8 100644
--- a/examples/platform/linux/NamedPipeCommands.cpp
+++ b/examples/platform/linux/NamedPipeCommands.cpp
@@ -80,11 +80,14 @@ void * NamedPipeCommands::EventCommandListenerTask(void * arg)
         }

         ssize_t readBytes      = read(fd, readbuf, kChipEventCmdBufSize);
-        readbuf[readBytes - 1] = '\0';
-        ChipLogProgress(NotSpecified, "Received payload: \"%s\"", readbuf);
+        if (readBytes > 0)
+        {
+            readbuf[readBytes - 1] = '\0';
+            ChipLogProgress(NotSpecified, "Received payload: \"%s\"", readbuf);

-        // Process the received command request from event fifo
-        self->mDelegate->OnEventCommandReceived(readbuf);
+            // Process the received command request from event fifo
+            self->mDelegate->OnEventCommandReceived(readbuf);
+        }

         close(fd);
     }