openthread / ot-br-posix

OpenThread Border Router, a Thread border router for POSIX-based platforms.
https://openthread.io/
BSD 3-Clause "New" or "Revised" License
418 stars 232 forks source link

Ping response get dropped #559

Closed L-Belluomini closed 3 years ago

L-Belluomini commented 4 years ago

Describe the bug A clear and concise description of what the bug is. Sending a ping request from the OT-BR, the message gets delivered, the replay is sent but the BR drop it.

To Reproduce Information to reproduce the behavior, including:

  1. Git commit id ea9cd69aa2eed3f320f6a80a4d0b966959c19ff0
  2. IEEE 802.15.4 hardware platform efr32mg12 BRD4166A
  3. Build steps flowed guide on ( https://openthread.io/guides/border-router)
  4. Network topology 1 x ot-br, 2 x ot ot-nodes ( cli-ftd ) all IEEE 802.15.4 hardware platform are efr32mg12 BRD4166A

Expected behavior A clear and concise description of what you expected to happen.

Console/log output If applicable, add console/log output to help explain your problem. log, console and pcap are attached. the log says the br drops the message, I have noted that the message in question has sec set to no, while the request has sec set to yes. Additional context Add any other context about the problem here. ( master-key is 00112233445566778899aabbccddeeff ) I have created the network from the two cli device and the joined the br using the master-key, and I have another board as a sniffer. I have tried using ping from ot-ctl and using the ping6 of the raspberry with no difference.

ping_issue_otbr_log.log ping_issue_shell_log.log capture-ping-2020-9-10.zip

jwhui commented 4 years ago

@L-Belluomini , from the packet capture, it seems that the ICMPv6 Echo Reply messages do not have IEEE 802.15.4 Frame Security enabled. OTBR is dropping the ICMPv6 Echo Reply message because frame security is not enabled.

Can you provide more detail on how you setup your ot-cli-ftd devices (e.g. git commit id, build commands, etc.)?

L-Belluomini commented 4 years ago

@jwhui , Hi thanks for the replay.

commit id of ot-cli-ftd for all devices is 21b49e2e2b9be05b996ad83fa96e4100746269d0 build command is make -f examples/Makefile-efr32mg12 BOARD=BRD4166A COAP=1 JOINER=1 DHCP6_CLIENT=1 to create the network i use the following step using the cli interface i setup the thread network, folowing the EFR32MG12 Example. then I join the border router using the master-key.

jwhui commented 4 years ago

@L-Belluomini , unfortunately, I was unable to reproduce this issue.

Can you point to the exact steps you used to setup your network?

@JaneFromSilabs, have you experienced this issue before?

L-Belluomini commented 4 years ago

Hi thanks for the support. I have to find possible causes for this issue, but I don't have a lot of experience so it wasn't very fruitful. I noticed that when the ot-br pings the cli device, as previously noted, the reply is not encrypted at the MAC level but when the cli makes the request it is encrypted. so even if I don't know exactly how, I don't think it is a problem with commisiong. I have alse tried joining the cli devices from the otbr commissioner, the procedure was successful but the ping reply from cli devices had the same problem. I also tried using udp and coap messages and those where always encrypted.

capture-ping-2020-9-28.zip

JaneFromSilabs commented 4 years ago

@L-Belluomini @jwhui Are either of these the same issue?

https://github.com/openthread/ot-br-posix/issues/471 https://github.com/openthread/ot-br-posix/issues/530

L-Belluomini commented 4 years ago

Hi, I don't think so. The ot-br log the message says the message gets dropped, and so i would not by a problem of the interface. The problem seems caused by the fact that the icmp replay message at the mac level is not encrypted. while the request is.

jwhui commented 4 years ago

@L-Belluomini @jwhui Are either of these the same issue?

471

530

@JaneFromSilabs , I do not think these issues are related. In this issue, the message is actually dropped by the MAC because the ICMPv6 Echo Reply is being sent with MAC security disabled. Unfortunately, I do not have a EFR32MG12 BRD4166A on hand to try this out.

JaneFromSilabs commented 4 years ago

I'll try this with the EFR32MG12 I have (brd4161a) with me, but I might need a couple of days to procure this exact board.

JaneFromSilabs commented 4 years ago

This was successful on my current set up with brd4161a. I'll try this again once I have a pair of brd4166a at home. I'll place an order which should get filled in a day or 2.

PING fd00:df4:6000:0:b0cf:452e:cf7a:c842(fd00:df4:6000:0:b0cf:452e:cf7a:c842) from fd00:df4:6000::ff:fe00:fc00 wpan0: 56 data bytes 64 bytes from fd00:df4:6000:0:b0cf:452e:cf7a:c842: icmp_seq=1 ttl=64 time=101 ms 64 bytes from fd00:df4:6000:0:b0cf:452e:cf7a:c842: icmp_seq=2 ttl=64 time=45.8 ms

jwhui commented 4 years ago

@JaneFromSilabs , thanks for testing and reporting back.

@L-Belluomini , can you list the exact steps you used to setup your devices to help use reproduce your issue?

JaneFromSilabs commented 4 years ago

@L-Belluomini If you could, please provide a list of the CLI commands you are using, that would be most helpful. It might make this easier to reproduce.

L-Belluomini commented 4 years ago

thanks @jwhui and @JaneFromSilabs for the support, I set up the CLI device like this:

dataset init new dataset masterkey 00112233445566778899aabbccddeeff dataset channel 15 dataset commit active ifconfig up thread start

then from the web interface of the ot-br I select the Join tab, I click on the join button relative to the CLI device. the default masterkey is the same for the CLI device I leave the default prefix I leave the default checked press join the procedure is successful

the I open the open a shell on the ot-br device sudo ot-ctl ping ( ip from the CLI device)

JaneFromSilabs commented 4 years ago

@L-Belluomini I just wanted to let you know that I did receive these boards and I was able to reproduce this behavior. I'll get back to you with more details as soon as I have more information.

jwhui commented 4 years ago

@L-Belluomini I just wanted to let you know that I did receive these boards and I was able to reproduce this behavior. I'll get back to you with more details as soon as I have more information.

@JaneFromSilabs , thanks for digging into this!

JaneFromSilabs commented 3 years ago

Hi @L-Belluomini,

After changing a few things, this is now working correctly. Most of the changes are documented in this issue: https://github.com/openthread/ot-br-posix/issues/561. These changes boil down to:

  1. Changing the baud rate to 460800 in /etc/default/otbr-agent as follows: OTBR_AGENT_OPTS="-I wpan0 spinel+hdlc+uart:///dev/ttyACM0?uart-baudrate=460800" (which can be verified using the command : stty -F /dev/ttyACM0)
  2. Change kMaxSize to 4096 in third_party/openthread/repo/src/lib/spinel/spinel_interface.hpp
  3. Make sure your ot-br-posix repo includes the pull request: openthread/openthread#5687 (or just get the latest)
  4. Verify that you have installed the latest STK adapter firmware version on your brd4166a (use the launcher in Simplicity Studio to get the latest)
  5. Enable hardware flow control using the following command on the STK admin CLI interface (use the admin console in Simplicity Studio): serial vcom config handshake rtscts (note: if you don't have this command, you likely haven't got the latest adapter firmware)

Please let me know if you need any further information.

Cheers, Jane

L-Belluomini commented 3 years ago

Hi @JaneFromSilabs thank a lot I relay appreciate your help. I will do so testing with this changes, and lat you know the result. thanks again

dougleaf commented 3 years ago

We had a similar issue in our setup, though in ours we would only see large packets being dropped. It seems the RCP baud rate is the final culprit, but you may also find it helpful to change the RCP communication buffers. See patch below. It worked for us.

index cc584c189..8fb883fef 100644
--- a/src/ncp/ncp_config.h
+++ b/src/ncp/ncp_config.h
@@ -66,7 +66,7 @@
  *
  */
 #ifndef OPENTHREAD_CONFIG_NCP_TX_BUFFER_SIZE
-#define OPENTHREAD_CONFIG_NCP_TX_BUFFER_SIZE 512
+#define OPENTHREAD_CONFIG_NCP_TX_BUFFER_SIZE 2048
 #endif

 /**
@@ -76,7 +76,7 @@
  *
  */
 #ifndef OPENTHREAD_CONFIG_NCP_UART_TX_CHUNK_SIZE
-#define OPENTHREAD_CONFIG_NCP_UART_TX_CHUNK_SIZE 128
+#define OPENTHREAD_CONFIG_NCP_UART_TX_CHUNK_SIZE 2048
 #endif

 /**
@@ -87,7 +87,7 @@
  */
 #ifndef OPENTHREAD_CONFIG_NCP_UART_RX_BUFFER_SIZE
 #if OPENTHREAD_RADIO
-#define OPENTHREAD_CONFIG_NCP_UART_RX_BUFFER_SIZE 512
+#define OPENTHREAD_CONFIG_NCP_UART_RX_BUFFER_SIZE 2048
 #else
 #define OPENTHREAD_CONFIG_NCP_UART_RX_BUFFER_SIZE 1300
 #endif
L-Belluomini commented 3 years ago

Hi @JaneFromSilabs, unfortunately even after the upgrade you suggested the bug persist. For ease of view I post the screenshot from the sniffer, from this you can se the ICMP reply is not secure, while the request is. The bug is present when the reply is sent by the CLI device. (other type of message, eg coap and udp, do not have this kind of anomaly and all messages are secure)

thanks for your help, hope it will be posible to find a solution Screenshot from 2020-11-23 17-04-08 Screenshot from 2020-11-23 17-04-12

dougleaf commented 3 years ago

Hi @JaneFromSilabs @L-Belluomini I am seeing a similar issue of ping packets being dropped on standard FTD devices. I wonder whether this happens on other RCP vendors? I want to rule out this is a PAL issue

Moreover, we don't face the same issue with our essentials bulbs, which is odd. We don't diverge too much from the FTD on the OpenThread layer

dougleaf commented 3 years ago

I just tested with Nordic RCP and it shows the same problem

jwhui commented 3 years ago

I just tested with Nordic RCP and it shows the same problem

@dougleaf , thanks for looking into this issue. Are you seeing the same problem as above, where the ICMPv6 Echo Reply is being sent with MAC frame security disabled? Can you provide more detail on your setup so I can reproduce?

dougleaf commented 3 years ago

thanks @jwhui

I have ot-daemon running at 460800bps at head on a RPi. RCP is also built at head on Silabs EFR32MG21 (also tried Nordic). I commission a sample FTD device running on EFR32MG21 built from OpenThread repo w/o modifications. It joins the network and DEVICE->BR pinging works. BR->DEVICE pinging does not work

Ping BR -> Device
-----------------

Nov 22 12:18:54 ot-daemon[2297]: [INFO]-PLAT----: processTransmit: OK
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: Request to start operation "TransmitDataDirect"
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: Starting operation "TransmitDataDirect"
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: Receive -> CsmaBackoff
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: CsmaBackoff -> Transmit
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: Transmit -> Receive
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]==============================[TX len=108]===============================
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 69 98 62 CD AB 00 B8 00 | 8C 0D 85 08 00 00 05 F2 | i.bM+.8........r
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 12 33 42 68 1B 29 E1 AF | 62 A3 BD D8 85 D0 2C 5A | .3Bh.)a/b#=X.P,Z
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 01 2C 9D 19 48 08 CE D7 | 23 84 C3 AF E2 B8 13 57 | .,..H.NW#.C/b8.W
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 75 C7 49 32 46 29 10 32 | DE 84 0E A2 26 F9 ED 7C | uGI2F).2^.."&ym|
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| EA 70 59 C4 46 81 FC 8C | AA 5E 75 0E C3 EC 9F D5 | jpYDF.|.*^u.Cl.U
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| DB 52 78 B7 41 B9 75 51 | 26 7B 0F AB 68 27 96 BB | [Rx7A9uQ&{.+h'.;
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| E7 95 2F A1 BD B5 43 E4 | AE 19 DD 4B .. .. .. .. | g./!=5Cd..]K....
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]------------------------------------------------------------------------
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: Finishing operation "TransmitDataDirect"
Nov 22 12:18:54 ot-daemon[2297]: [INFO]-MAC-----: Sent IPv6 ICMP6 msg, len:104, chksum:0000, to:0xb800, sec:yes, prio:normal
Nov 22 12:18:54 ot-daemon[2297]: [INFO]-MAC-----:     src:[fd7b:4815:c237:0:87f0:8e82:9932:6b77]
Nov 22 12:18:54 ot-daemon[2297]: [INFO]-MAC-----:     dst:[fd7b:4815:c237:0:b15f:b3de:ea14:53a7]
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: Received frame from short address 0xb800
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]==============================[RX len=096]===============================
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 61 98 82 CD AB 00 8C 00 | B8 72 D5 11 08 3A B1 5F | a..M+...8rU..:1_
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| B3 DE EA 14 53 A7 87 F0 | 8E 82 99 32 6B 77 81 00 | 3^j.S'.p...2kw..
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 83 02 0A FC 00 01 7E 9D | BA 5F F9 DB 05 00 08 09 | ...|..~.:_y[....
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 0A 0B 0C 0D 0E 0F 10 11 | 12 13 14 15 16 17 18 19 | ................
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 1A 1B 1C 1D 1E 1F 20 21 | 22 23 24 25 26 27 28 29 | ...... !"#$%&'()
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]| 2A 2B 2C 2D 2E 2F 30 31 | 32 33 34 35 36 37 1A 31 | *+,-./01234567.1
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]------------------------------------------------------------------------
Nov 22 12:18:54 ot-daemon[2297]: [NOTE]-MAC-----: Dropping rx lowpan HC frame, error:Drop, len:85, src:0xb800, dst:0x8c00, sec:no
Nov 22 12:18:54 ot-daemon[2297]: [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11
Nov 22 12:18:55 ot-daemon[2297]: [INFO]-PLAT----: processTransmit: OK
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: Request to start operation "TransmitDataDirect"
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: Starting operation "TransmitDataDirect"
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: Receive -> CsmaBackoff
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: CsmaBackoff -> Transmit
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: Transmit -> Receive
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]==============================[TX len=108]===============================
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 69 98 63 CD AB 00 B8 00 | 8C 0D 86 08 00 00 05 A5 | i.cM+.8........%
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 79 86 8C 88 1F 0A 97 04 | 42 67 BE 9C 01 ED D5 EE | y.......Bg>..mUn
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 29 01 BA 9D 56 19 5E FB | 33 FE 6E 56 D6 20 30 5F | ).:.V.^{3~nVV 0_
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| ED 32 6B C5 34 B2 35 00 | 44 2A C1 68 B2 E3 76 03 | m2kE425.D*Ah2cv.
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 47 87 41 12 77 AB 23 01 | 79 B7 B7 DB 5A 5C 37 87 | G.A.w+#.y77[Z\7.
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 5F 26 40 04 A1 F2 89 B5 | CB 20 7D 58 A1 A9 E3 70 | _&@.!r.5K }X!)cp
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 0E 0C CE A1 F1 6E 0B 1B | CD 1E DD 4B .. .. .. .. | ..N!qn..M.]K....
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]------------------------------------------------------------------------
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: Finishing operation "TransmitDataDirect"
Nov 22 12:18:55 ot-daemon[2297]: [INFO]-MAC-----: Sent IPv6 ICMP6 msg, len:104, chksum:0000, to:0xb800, sec:yes, prio:normal
Nov 22 12:18:55 ot-daemon[2297]: [INFO]-MAC-----:     src:[fd7b:4815:c237:0:87f0:8e82:9932:6b77]
Nov 22 12:18:55 ot-daemon[2297]: [INFO]-MAC-----:     dst:[fd7b:4815:c237:0:b15f:b3de:ea14:53a7]
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: Received frame from short address 0xb800
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]==============================[RX len=096]===============================
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 61 98 83 CD AB 00 8C 00 | B8 72 D5 11 08 3A B1 5F | a..M+...8rU..:1_
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| B3 DE EA 14 53 A7 87 F0 | 8E 82 99 32 6B 77 81 00 | 3^j.S'.p...2kw..
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| EC 59 0A FC 00 02 7F 9D | BA 5F 8E 83 06 00 08 09 | lY.|....:_......
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 0A 0B 0C 0D 0E 0F 10 11 | 12 13 14 15 16 17 18 19 | ................
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 1A 1B 1C 1D 1E 1F 20 21 | 22 23 24 25 26 27 28 29 | ...... !"#$%&'()
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]| 2A 2B 2C 2D 2E 2F 30 31 | 32 33 34 35 36 37 1A 31 | *+,-./01234567.1
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]------------------------------------------------------------------------
Nov 22 12:18:55 ot-daemon[2297]: [NOTE]-MAC-----: Dropping rx lowpan HC frame, error:Drop, len:85, src:0xb800, dst:0x8c00, sec:no
Nov 22 12:18:55 ot-daemon[2297]: [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11

And in the opposite direction

Ping Device -> BR (works)
-----------------

Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: Received frame from short address 0xb800
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: Rx security - frame counter 8
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]==============================[RX len=057]===============================
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]| 69 98 77 CD AB 00 8C 00 | B8 0D 08 00 00 00 05 7A | i.wM+...8......z
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]| D5 11 3A B1 5F B3 DE EA | 14 53 A7 87 F0 8E 82 99 | U.:1_3^j.S'.p...
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]| 32 6B 77 80 00 91 52 00 | 02 00 02 00 0B 20 AA 00 | 2kw...R...... *.
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]| 00 00 00 C8 18 6D 4A 17 | 32 .. .. .. .. .. .. .. | ...H.mJ.2.......
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]------------------------------------------------------------------------
Nov 22 12:15:44 ot-daemon[2297]: [INFO]-MAC-----: Received IPv6 ICMP6 msg, len:56, chksum:0000, from:0xb800, sec:yes, prio:normal, rss:-77.0
Nov 22 12:15:44 ot-daemon[2297]: [INFO]-MAC-----:     src:[fd7b:4815:c237:0:b15f:b3de:ea14:53a7]
Nov 22 12:15:44 ot-daemon[2297]: [INFO]-MAC-----:     dst:[fd7b:4815:c237:0:87f0:8e82:9932:6b77]
Nov 22 12:15:44 ot-daemon[2297]: [INFO]-PLAT----: processReceive: OK
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11
Nov 22 12:15:44 ot-daemon[2297]: [INFO]-PLAT----: processTransmit: OK
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: Request to start operation "TransmitDataDirect"
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: Starting operation "TransmitDataDirect"
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: Receive -> CsmaBackoff
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: CsmaBackoff -> Transmit
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: RadioState: Transmit -> Receive
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]==============================[TX len=060]===============================
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]| 69 98 58 CD AB 00 B8 00 | 8C 0D 81 08 00 00 05 DD | i.XM+.8........]
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]| B3 03 37 E1 C2 16 1B 45 | D4 1C BC 58 58 30 67 BE | 3.7aB..ET.<XX0g>
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]| 6A A7 FF E6 FB E5 7B 3F | 92 3D 46 E3 25 F5 3B D8 | j'.f{e{?.=Fc%u;X
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]| C6 63 C3 D7 4C D1 F5 C6 | A3 2D 9B 19 .. .. .. .. | FcCWLQuF#-......
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]------------------------------------------------------------------------
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: Finishing operation "TransmitDataDirect"
Nov 22 12:15:44 ot-daemon[2297]: [INFO]-MAC-----: Sent IPv6 ICMP6 msg, len:56, chksum:0000, to:0xb800, sec:yes, prio:normal
Nov 22 12:15:44 ot-daemon[2297]: [INFO]-MAC-----:     src:[fd7b:4815:c237:0:87f0:8e82:9932:6b77]
Nov 22 12:15:44 ot-daemon[2297]: [INFO]-MAC-----:     dst:[fd7b:4815:c237:0:b15f:b3de:ea14:53a7]
Nov 22 12:15:44 ot-daemon[2297]: [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11
Nov 22 12:15:46 ot-daemon[2297]: [INFO]-MESH-CP-: sent keep alive
Nov 22 12:15:46 ot-daemon[2297]: [INFO]-MESH-CP-: received keep alive
Nov 22 12:15:46 ot-daemon[2297]: [INFO]-MESH-CP-: sent keep alive response
Nov 22 12:15:46 ot-daemon[2297]: [INFO]-MESH-CP-: received Leader keep-alive response
dougleaf commented 3 years ago

I tried to pinpoint the function that was failing on host. I got to the line

VerifyOrExit(ip6.GetDestination().IsLinkLocal() || ip6.GetDestination().IsLinkLocalMulticast());

on ip6_filter.cpp. However, even commenting that line, it would fail on something else later. That led me to believe this isn’t a problem with the packet decoding logic, but rather we still have corruption of data between RCP and Host. I also noticed that successful pings starting from the device would have the same first two bytes 69 98 on the ip header

        69              98
0 1 1 0 1 0 0 1 | 1 0 0 1 1 0 0 0
 Ip 6   | AF43        | …

But unsuccessful pings have an unregistered differentiated service field codepoint. Data corruption on the RCP side or on comms?

        61              98
0 1 1 0 0 0 0 1 | 1 0 0 1 1 0 0 0
Ip 6   | ?? no DSCP! | https://www.iana.org/assignments/dscp-registry/dscp-registry.xhtml
dougleaf commented 3 years ago

RCP SIDE

Ping Device -> BR (works)
-----------------

00> [0005549499] [INFO]-PLAT----: Received data:65
00> [0005549499] [INFO]-PLAT----: Received 65 bytes
00> [0005549499] [DEBG]-MAC-----: LinkRaw::ReceiveDone(65 bytes), error:OK
00> [0005549541] [DEBG]-MAC-----: RadioState: Receive -> CsmaBackoff
00> [0005549541] [INFO]-PLAT----: State=OT_RADIO_STATE_RECEIVE
00> [0005549541] [DEBG]-MAC-----: RadioState: CsmaBackoff -> Transmit
00> [0005549545] [INFO]-PLAT----: Received data:5
00> [0005549545] [DEBG]-MAC-----: RadioState: Transmit -> Receive
00> [0005549546] [DEBG]-MAC-----: LinkRaw::TransmitDone(66 bytes), error:OK
BR -> Ping Device
-----------------

00> [0005651315] [DEBG]-MAC-----: RadioState: Receive -> CsmaBackoff
00> [0005651315] [INFO]-PLAT----: State=OT_RADIO_STATE_RECEIVE
00> [0005651316] [DEBG]-MAC-----: RadioState: CsmaBackoff -> Transmit
00> [0005651321] [INFO]-PLAT----: Received data:5
00> [0005651321] [DEBG]-MAC-----: RadioState: Transmit -> Receive
00> [0005651322] [DEBG]-MAC-----: LinkRaw::TransmitDone(69 bytes), error:OK
00> [0005651327] [INFO]-PLAT----: Received data:56
00> [0005651327] [INFO]-PLAT----: Received 56 bytes
00> [0005651327] [DEBG]-MAC-----: LinkRaw::ReceiveDone(56 bytes), error:OK
00> [0005651422] [INFO]-PLAT----: State=OT_RADIO_STATE_RECEIVE
00> [0005651453] [INFO]-PLAT----: State=OT_RADIO_STATE_RECEIVE
00> [0005651483] [INFO]-PLAT----: State=OT_RADIO_STATE_RECEIVE
00> [0005651515] [INFO]-PLAT----: State=OT_RADIO_STATE_RECEIVE
00> [0005652323] [DEBG]-MAC-----: RadioState: Receive -> CsmaBackoff
00> [0005652324] [INFO]-PLAT----: State=OT_RADIO_STATE_RECEIVE
00> [0005652324] [DEBG]-MAC-----: RadioState: CsmaBackoff -> Transmit
00> [0005652330] [INFO]-PLAT----: Received data:5
00> [0005652330] [DEBG]-MAC-----: RadioState: Transmit -> Receive
00> [0005652330] [DEBG]-MAC-----: LinkRaw::TransmitDone(69 bytes), error:OK
00> [0005652335] [INFO]-PLAT----: Received data:56
00> [0005652336] [INFO]-PLAT----: Received 56 bytes
00> [0005652336] [DEBG]-MAC-----: LinkRaw::ReceiveDone(56 bytes), error:OK
00> [0005654684] [INFO]-PLAT----: Received data:70
00> [0005654684] [INFO]-PLAT----: Received 70 bytes
00> [0005654684] [DEBG]-MAC-----: LinkRaw::ReceiveDone(70 bytes), error:OK
00> [0005655873] [DEBG]-MAC-----: RadioState: Receive -> CsmaBackoff
00> [0005655873] [INFO]-PLAT----: State=OT_RADIO_STATE_RECEIVE
00> [0005655874] [DEBG]-MAC-----: RadioState: CsmaBackoff -> Transmit
00> [0005655879] [DEBG]-MAC-----: RadioState: Transmit -> Receive
00> [0005655879] [DEBG]-MAC-----: LinkRaw::TransmitDone(70 bytes), error:OK

I could not enable it to show the data in/out on this side, is that possible to log it through JLink RTT?

dougleaf commented 3 years ago

In the screenshots below, I sniffed ICMP packets and it shows that 802.15.4 security is in fact dropped on the ICMP Reply only on the case BR->DEVICE. It doesn't happen in DEVICE->BR

Request Screen Shot 2020-11-24 at 1 50 06 PM

Reply Screen Shot 2020-11-24 at 1 50 18 PM

jwhui commented 3 years ago

I also noticed that successful pings starting from the device would have the same first two bytes 69 98 on the ip header

       69              98
0 1 1 0 1 0 0 1 | 1 0 0 1 1 0 0 0
Ip 6   | AF43        | …

But unsuccessful pings have an unregistered differentiated service field codepoint. Data corruption on the RCP side or on comms?

       61              98
0 1 1 0 0 0 0 1 | 1 0 0 1 1 0 0 0
Ip 6   | ?? no DSCP! | https://www.iana.org/assignments/dscp-registry/dscp-registry.xhtml

@dougleaf , the first two bytes form the IEEE 802.15.4 MAC Header FCF field - they differ in the FCF Security Enabled bit.

jwhui commented 3 years ago

I have ot-daemon running at 460800bps at head on a RPi. RCP is also built at head on Silabs EFR32MG21 (also tried Nordic). I commission a sample FTD device running on EFR32MG21 built from OpenThread repo w/o modifications. It joins the network and DEVICE->BR pinging works. BR->DEVICE pinging does not work

@dougleaf , your findings seems to indicate that this is an issue on the FTD device - it is the response from the FTD device that has the MAC header Security Enabled field set to zero. Are you able to test with the Nordic device configured as the FTD?

dougleaf commented 3 years ago

Ah, ok. I wasn't sure whether those printed bytes were the IP header or IEE802.15.4. Coincidence one was parsable...

Yes. I'll try with the Nordic FTD

dougleaf commented 3 years ago

Nordic FTD works fine. Security is not dropped

@JaneFromSilabs @rerasool maybe this is a regression on GSDK 3.0?

Nov 25 10:47:44 [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11
Nov 25 10:47:45 [INFO]-PLAT----: processTransmit: OK
Nov 25 10:47:45 [DEBG]-MAC-----: Request to start operation "TransmitDataDirect"
Nov 25 10:47:45 [DEBG]-MAC-----: Starting operation "TransmitDataDirect"
Nov 25 10:47:45 [DEBG]-MAC-----: RadioState: Receive -> CsmaBackoff
Nov 25 10:47:45 [DEBG]-MAC-----: RadioState: CsmaBackoff -> Transmit
Nov 25 10:47:45 [DEBG]-MAC-----: RadioState: Transmit -> Receive
Nov 25 10:47:45 [DEBG]==============================[TX len=108]===============================
Nov 25 10:47:45 [DEBG]| 69 98 E2 CD AB 0B 00 00 | 00 0D FE 30 00 00 05 C0 | i.bM+.....~0...@
Nov 25 10:47:45 [DEBG]| DA F2 57 FB D3 E7 49 C1 | C9 E1 65 B1 A7 E2 9D 2B | ZrW{SgIAIae1'b.+
Nov 25 10:47:45 [DEBG]| AA A9 C9 1B C3 C4 71 D6 | 6C 12 8E A3 E2 1F 71 AC | *)I.CDqVl..#b.q,
Nov 25 10:47:45 [DEBG]| 71 B4 ED D0 7C F4 5F 09 | 17 1B 56 50 F1 1B E5 1E | q4mP|t_...VPq.e.
Nov 25 10:47:45 [DEBG]| 43 A6 8D 21 77 EF 62 E5 | 02 3A BE D6 FE 33 6B 93 | C&.!wobe.:>V~3k.
Nov 25 10:47:45 [DEBG]| 27 0E 6F 6E A8 78 65 23 | 40 C9 36 6F B2 82 C7 DF | '.on(xe#@I6o2.G_
Nov 25 10:47:45 [DEBG]| FD 5F 3D 50 0F B1 8B 67 | 94 43 75 64 .. .. .. .. | }_=P.1.g.Cud....
Nov 25 10:47:45 [DEBG]------------------------------------------------------------------------
Nov 25 10:47:45 [DEBG]-MAC-----: Finishing operation "TransmitDataDirect"
Nov 25 10:47:45 [INFO]-MAC-----: Sent IPv6 ICMP6 msg, len:104, chksum:0000, to:0x000b, sec:yes, prio:normal
Nov 25 10:47:45 [INFO]-MAC-----:     src:[fd7b:4815:c237:0:87f0:8e82:9932:6b77]
Nov 25 10:47:45 [INFO]-MAC-----:     dst:[fd7b:4815:c237:0:40ba:1084:c9bc:4f85]
Nov 25 10:47:45 [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11
Nov 25 10:47:45 [DEBG]-MAC-----: Received frame from short address 0x000b
Nov 25 10:47:45 [DEBG]-MAC-----: Rx security - frame counter 55
Nov 25 10:47:45 [DEBG]==============================[RX len=105]===============================
Nov 25 10:47:45 [DEBG]| 69 98 61 CD AB 00 00 0B | 00 0D 37 00 00 00 05 7A | i.aM+.....7....z
Nov 25 10:47:45 [DEBG]| D5 11 3A 40 BA 10 84 C9 | BC 4F 85 87 F0 8E 82 99 | U.:@:..I<O..p...
Nov 25 10:47:45 [DEBG]| 32 6B 77 81 00 A5 BB 05 | A4 00 35 A1 7C BE 5F EC | 2kw..%;.$.5!|>_l
Nov 25 10:47:45 [DEBG]| E1 06 00 08 09 0A 0B 0C | 0D 0E 0F 10 11 12 13 14 | a...............
Nov 25 10:47:45 [DEBG]| 15 16 17 18 19 1A 1B 1C | 1D 1E 1F 20 21 22 23 24 | ........... !"#$
Nov 25 10:47:45 [DEBG]| 25 26 27 28 29 2A 2B 2C | 2D 2E 2F 30 31 32 33 34 | %&'()*+,-./01234
Nov 25 10:47:45 [DEBG]| 35 36 37 69 01 72 FB 4A | 7B .. .. .. .. .. .. .. | 567i.r{J{.......
Nov 25 10:47:45 [DEBG]------------------------------------------------------------------------
Nov 25 10:47:45 [INFO]-MAC-----: IP6 filter
Nov 25 10:47:45 [INFO]-MAC-----: Received IPv6 ICMP6 msg, len:104, chksum:0000, from:0x000b, sec:yes, prio:normal, rss:-51.0
Nov 25 10:47:45 [INFO]-MAC-----:     src:[fd7b:4815:c237:0:40ba:1084:c9bc:4f85]
Nov 25 10:47:45 [INFO]-MAC-----:     dst:[fd7b:4815:c237:0:87f0:8e82:9932:6b77]
Nov 25 10:47:45 [INFO]-PLAT----: processReceive: OK
Nov 25 10:47:45 [DEBG]-MAC-----: Idle mode: Radio receiving on channel 11
eujean commented 3 years ago

I can confirm I see a similar thing with the TI CC1352 Platform.

Ping from DEVICE -> BR works fine, but ping from BR -> DEVICE fails with dropped frame.

Pretty sure this used to work fine at some point in the past though.

JaneFromSilabs commented 3 years ago

Hi @L-Belluomini and @dougleaf,

We have the issue reproduced internally and are close to understanding the source of the problem. When I have more information, I'll update.

Cheers, Jane

lmnotran commented 3 years ago

Hi everyone,

I think I've found the problem. Icmp::HandleEchoRequest() uses the default Message settings when making a new message. https://github.com/openthread/openthread/pull/4977 changed Ip6::NewMessage() to make the Message::Settings argument optional with a default value that is supposed to enable link security. (link to specific line). The default value for this arg is Message::Settings::kDefault.

At first glance, the way kDefault is defined seems fine. Upon further inspection though, it looks like kDefault is being zero-initialized instead of being initialized with link-security enabled. Here's a screenshot from Ozone image

I don't really have a concrete explanation for why it's not being initialized properly, but the fix I have proposed in https://github.com/openthread/openthread/pull/5900 seems to fix the issue, at least on the EFR32 platform.

@eujean Can you test my changes to see if it fixes the issue for the CC1352 platform?

CC: @abtink @dougleaf @L-Belluomini @JaneFromSilabs @romacdon @jwhui

abtink commented 3 years ago

Thanks @lmnotran for investigating this.

Seems like somehow the constructor is not being called to init the kDefault static member variable (and it is just zero initialized). It is interesting when it is scoped in a method in your fix, it seems to work fine.

Can you please share which toolchain you are using? Thanks.

I cannot explain this, but if there happens to be a limitation on some tool-chains it would be good to understand and be aware of it and ensure to not use the same pattern in future.

jwhui commented 3 years ago

@lmnotran , thanks for looking into this issue!

Does your platform perform C++ runtime initialization before calling main()? For example: https://github.com/openthread/openthread/blob/8630547d106daeaff64c6c960ff316036222e4db/examples/platforms/cc2538/startup-gcc.c#L201-L210

dougleaf commented 3 years ago

@jwhui I believe in case of SiLabs this is handled inside newlib init. @lmnotran I’m also interested on the specifics of C++ runtime init on SiLabs platform

lmnotran commented 3 years ago

@jwhui We don't do any C++ specific init like calling global constructors in our startup code. I can't link to it because it's not in git, but for the efr32mg12 platform, you can find the startup file at third_party/silabs/gecko_sdk_suite/v3.0/platform/Device/SiliconLabs/EFR32MG12P/Source/GCC/startup_efr32mg12p.c

We believe the issue is that the our Makefile is setup to not link the standard initialization code for the C++ runtime while also not having a replacement. See Lines 117 and 159. https://github.com/openthread/openthread/blob/0b0d6eef1d569c77a6cc08c3d4a39b76ea3e33c0/examples/Makefile-efr32mg12#L117,L159 Line 159 seems to have been there since the EFR32MG12 platform was first added.

It looks like the TI ccXXXX platforms do the same thing and are having the same issue according to @eujean so I opened https://github.com/openthread/openthread/pull/5905 which should fix this

lmnotran commented 3 years ago

CC: @srickardti

lmnotran commented 3 years ago

@srickardti Here's a patch with the same changes applied for TI's ccXXXX platform. Note that the changes are untested as we don't have any TI hardware to test with.

ti-ccXXXX-remove-nostartfiles.diff.zip

srickardti commented 3 years ago

@lmnotran Thanks for the patch, but this issue looks a bit more complicated than just removing those flags. The startfiles being included isn't real useful if the code is not called.

In the past we've gotten around these issues by forbidding globally initialized C++ objects. IMO, that's a good way to do it, but it does produce more cluttered code.

We have two options to initialize the global C++ objects; call the constructors directly like the cc2538 startup, or call the newlib __start function. I lean towards the former because my intuition is that it will be smaller code size. But I'll test out both and report back.

srickardti commented 3 years ago

I was able to get the object initialized by copying the constructor loop from the CC2538 startup code. I'll ping one of our compiler people to see if there is a better way with a built-in c library.

jwhui commented 3 years ago

Resolved by:

Thanks @lmnotran @srickardti @dougleaf @JaneFromSilabs @L-Belluomini @eujean !