Closed jinpeng1989 closed 1 week ago
Crash here, please see log. SyslogCatchAll-2024-08-28-1-and-2.zip
Are you able to reference a specific GitHub commit in an OpenThread repo? I did look at the Simplicity SDK link you provided above, but it wasn't obvious which OpenThread repo commit it was using.
Can you provide more details on the specific test scenario so that others can reproduce this issue?
Thanks for reporting this.
~@jwhui and I investigated this and found a potential cause for this situation.~
~This scenario can occur when IPv6 fragmentation is enabled and utilized. Could you confirm whether you have OPENTHREAD_CONFIG_IP6_FRAGMENTATION_ENABLE
enabled in your project?~
~Brief description of the issue:~
~- A message using IPv6 fragmentation can be placed in Ip6::mReassemblyList
even if it's also marked for transmission to the Thread mesh.~
~- This can lead to the message being included in two separate queues. Which is not allowed and causes the assert.~
~- I'll submit a PR later to address this.~
Ignore earlier comment. Investigating this further, there is no issue related to this (as a clone of message is allocated to be added in Ip6::mReassemblyList
).
The release note for the simplicity_sdk describes the code repository used. The Silicon Labs OpenThread SDK includes all changes from the OpenThread GitHub repo (https://github.com/openthread/openthread) up to and including commit 1fceb225b. The Silicon Labs OpenThread SDK includes all changes from the OpenThread border router GitHub repo (https://github.com/openthread/ot-br-posix) up to and including commit e56c02006. https://www.silabs.com/documents/public/release-notes/open-thread-release-notes-2.5.1.0.pdf
Can you provide more information about your HW setup? Are you running this on a Raspberry Pi?
This is the first time we've seen this bug reported, so just trying to understand if there's an issue related to your specific setup.
We discovered the issue during a system test involving five models of device. Three of them is SED, one is TBR, one is REED. The system consists of 1 TBR + 16 TME + 84 SED. However, it does not mean that Thread network size is a necessary condition for this issue. The otbr-agent crash has also been observed in small systems. One special feature is that both the diagnostics and mesh diagnostics interfaces are accessed.
The otbr-posix runs on OpenWRT system. This solution has been around for two or three years. The otbr-posix code was recently updated to introduce the mesh dianostic feature. Many issues occur frequently on this version.
From the stack trace in https://github.com/openthread/ot-br-posix/issues/2475#issue-2507427967, it appears that this assert is getting triggered:
However, the first thing that HandleSendQueue()
does is call Dequeue()
, which does this:
So it's not clear yet why the asserts are failing.
This issue occurred frequently in our test environment, and was observed at least once in five days. What can we do to further analyze this issue?
This issue occurred frequently in our test environment, and was observed at least once in five days. What can we do to further analyze this issue?
If possible, you can help analyze the code path identified in https://github.com/openthread/ot-br-posix/issues/2475#issuecomment-2333217331 and determine where the assert conditions are no longer true.
I would suggest checking whether or not OPENTHREAD_CONFIG_IP6_FRAGMENTATION_ENABLE
is enabled on your build.
If it is enabled, it would be good to see if you can disable it and test again (this would give a clue whether the fragmentation logic may be impacting this).
Describe the bug: The otbr-agent process crashed, and GDB debugging found that the error was near the PriorityQueue function. The ot-br-posix code used is: https://github.com/SiliconLabs/simplicity_sdk/tree/v2024.6.1-0/util/third_party/ot-br-posix Release note: https://github.com/SiliconLabs/simplicity_sdk/releases/tag/v2024.6.1-0