zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.86k stars 6.62k forks source link

Bluetooth: Mesh: Devices relay their own messages (even with relay disabled!) #27716

Closed MarcianoPreciado closed 4 years ago

MarcianoPreciado commented 4 years ago

Describe the bug Bluetooth Mesh devices relay their own messages even when the relay feature is disabled/not supported. From what I've seen on one of my projects, device relay's its own message, then upon receiving its own relayed message, it repeats and gets stuck in a relay loop.

To Reproduce Steps to reproduce the behavior:

  1. Go to sample project samples/bluetooth/mesh_demo
  2. Make some changes to prj.conf
    
    CONFIG_BT_MESH_DEBUG=y
    CONFIG_BT_DEBUG_LOG=y
    CONFIG_BT_MESH_DEBUG_NET=y
    CONFIG_LOG_BUFFER_SIZE=32768
    CONFIG_LOG_STRDUP_BUF_COUNT=128
    CONFIG_BT_MESH_RELAY=n
3. In main.c on line 51 and 54 change to `  .relay = BT_MESH_RELAY_NOT_SUPPORTED,`
4. Build and upload `west build -p -b nrf52840dk_nrf52840 -d ./build` (I used nRF52840 and manually called `board_button_1_pressed()` manually right before `while(1)`)
5. See evidence of relaying in logs

**Expected behavior**
The device should encrypt and send the packet a few times (depending on model retransmit count) and be done.

**Impact**
This is a show stopper for my product. Not only are my low-power-nodes (LPNs) wasting tons of energy retransmitting their own messages, it clutters the network with tons of unnecessary traffic.

**Logs and console output**

Booting Zephyr OS build v2.3.0-rc1-ncs1-4-ga6a1fb4642fa Initializing... Unicast address: 0x0b0c Bluetooth initialized Mesh initialized Loading stored settings Using stored settings Subscribing to heartbeat messages Button message sent with OpCode 0x00c005f1 [00:00:00.011,108]  fs_nvs: 8 Sectors of 4096 bytes [00:00:00.011,108]  fs_nvs: alloc wra: 0, de8 [00:00:00.011,108]  fs_nvs: data wra: 0, 2b8 [00:00:00.012,298]  bt_hci_core: HW Platform: Nordic Semiconductor (0x0002) [00:00:00.012,298]  bt_hci_core: HW Variant: nRF52x (0x0002) [00:00:00.012,329]  bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 2.3 Build 0 [00:00:00.012,512]  bt_hci_core: No ID address. App must call settings_load() [00:00:00.023,895]  settings: set-value failure. key: bt/hash error(-2) [00:00:00.024,200]  bt_hci_core: Identity: e0:a0:25:e4:c8:f9 (random) [00:00:00.024,23src 0x0b0c 0]  bt_hci_core: HCI: version 5.2 (0x0b) revision 0x0000, manufacturer 0x05f1 [00:00:00.024,230]  bt_hci_core: LMP: version 5.2 (0x0b) subver 0xffff [00:00:00.026,641]  bt_mesh_net.bt_mesh_net_keys_create: NID 0x37 EncKey ceac76a8c2b228a93eecc8d30738636~ [00:00:00.026,641]  bt_mesh_net.bt_mesh_net_keys_create: PrivacyKey 22bbd0b215bc5c7f0e6ff56d0088b93~ [00:00:00.027,862]  bt_mesh_net.bt_mesh_net_keys_create: NetID c7005d732329eadf [00:00:00.029,113]  bt_mesh_net.bt_mesh_net_keys_create: BeaconKey f2059fe0a06e8ffa0a97bf5070ee6ac~ [00:00:00.029,113]  bt_mesh_net.bt_mesh_net_beacon_update: NetIndex 0 Using current key [00:00:00.029,113]  bt_mesh_net.bt_mesh_net_beacon_update: flags 0x00, IVI 0x00000000 [00:00:00.029,541]  bt_mesh_net.bt_mesh_net_beacon_update: NetIndex 0 Using current key [00:00:00.029,541]  bt_mesh_net.bt_mesh_net_beacon_update: flags 0x00, IVI 0x00000000 [00:00:00.030,426]  bt_mesh_main: Primary Element: 0x0b0c [00:00:00.030,426]  bt_mesh_main.bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000 [00:00:00.032,501]  bt_mesh_net.bt_mesh_net_send: src 0x0b0c dst 0x0b0c len 12 headroom 9 tailroom 8 [00:00:00.032,531]  bt_mesh_net.bt_mesh_net_send: Payload len 12: 00880a61ea6968f6901112f3 [00:00:00.032,531]  bt_mesh_net.bt_mesh_net_send: Seq 0x0008ff [00:00:00.032,531]  bt_mesh_net.bt_mesh_net_encode: src 0x0b0c dst 0x0b0c ctl 0 seq 0x0008ff [00:00:00.036,437]  bt_mesh_net.bt_mesh_net_send: src 0x0b0c dst 0xc000 len 8 headroom 9 tailroom 12 [00:00:00.036,468]  bt_mesh_net.bt_mesh_net_send: Payload len 8: 4b903ba15029f24a [00:00:00.036,468]  bt_mesh_net.bt_mesh_net_send: Seq 0x000900 [00:00:00.036,468]  bt_mesh_net.bt_mesh_net_encode: src 0x0b0c dst 0xc000 ctl 0 seq 0x000900 [00:00:00.069,885]  bt_mesh_net.bt_mesh_net_local: len 25: 37085d6a01b89ca5ad98ea6859019c1~ [00:00:00.069,915]  bt_mesh_net.bt_mesh_net_recv: rssi 0 net_if 1 [00:00:00.069,946]  bt_mesh_net.bt_mesh_net_decode: 25 bytes: 37085d6a01b89ca5ad98ea6859019c1~ [00:00:00.069,946]  bt_mesh_net.net_find_and_decrypt:  [00:00:00.069,946]  bt_mesh_net.net_decrypt: NID 0x37 net_idx 0x0000 [00:00:00.069,946]  bt_mesh_net.net_decrypt: IVI 0 net->iv_index 0x00000000 [00:00:00.069,976]  bt_mesh_net.net_decrypt: src 0x0b0c [00:00:00.070,037]  bt_mesh_net.bt_mesh_net_decode: Decryption successful. Payload len 21 [00:00:00.070,068]  bt_mesh_net.bt_mesh_net_decode: src 0x0b0c dst 0x0b0c ttl 7 [00:00:00.070,068]  bt_mesh_net.bt_mesh_net_decode: PDU: 37070008ff0b0c0b0c00880a61ea696~ [00:00:00.070,281]  bt_mesh_net.bt_mesh_net_send: src 0x0b0c dst 0x0b0c len 16 headroom 9 tailroom 4 [00:00:00.070,312]  bt_mesh_net.bt_mesh_net_send: Payload len 16: 006f2a263934a06ef0446aaf3473384~ [00:00:00.070,312]  bt_mesh_net.bt_mesh_net_send: Seq 0x000901 [00:00:00.070,312]  bt_mesh_net.bt_mesh_net_encode: src 0x0b0c dst 0x0b0c ctl 0 seq 0x000901 [00:00:00.070,495]  bt_mesh_net.bt_mesh_net_local: len 21: 370f80c2988841372aedf42fbc32107~ [00:00:00.070,495]  bt_mesh_net.bt_mesh_net_recv: rssi 0 net_if 1 [00:00:00.070,526]  bt_mesh_net.bt_mesh_net_decode: 21 bytes: 370f80c2988841372aedf42fbc32107~ [00:00:00.070,526]  bt_mesh_net.net_find_and_decrypt:  [00:00:00.070,526]  bt_mesh_net.net_decrypt: NID 0x37 net_idx 0x0000 [00:00:00.070,526]  bt_mesh_net.net_decrypt: IVI 0 net->iv_index 0x00000000 [00:00:00.070,556]  bt_mesh_net.net_decrypt: src 0x0b0c [00:00:00.070,617]  bt_mesh_net.bt_mesh_net_decode: Decryption successful. Payload len 17 [00:00:00.070,648]  bt_mesh_net.bt_mesh_net_decode: src 0x0b0c dst 0xc000 ttl 7 [00:00:00.070,648]  bt_mesh_net.bt_mesh_net_decode: PDU: 37070009000b0cc0004b903ba15029f~ [00:00:00.110,229]  bt_mesh_net.bt_mesh_net_relay: TTL 7 CTL 0 dst 0xc000 [00:00:00.110,260]  bt_mesh_net.bt_mesh_net_relay: Relaying packet. TTL is now 7 [00:00:00.110,382]  bt_mesh_net.bt_mesh_net_local: len 29: 3762a280d163a7e17b4a42c8c53bd9b~ [00:00:00.110,412]  bt_mesh_net.bt_mesh_net_recv: rssi 0 net_if 1 [00:00:00.110,443]  bt_mesh_net.bt_mesh_net_decode: 29 bytes: 3762a280d163a7e17b4a42c8c53bd9b~ [00:00:00.110,443]  bt_mesh_net.net_find_and_decrypt:  [00:00:00.110,443]  bt_mesh_net.net_decrypt: NID 0x37 net_idx 0x0000 [00:00:00.110,443]  bt_mesh_net.net_decrypt: IVI 0 net->iv_index 0x00000000 [00:00:00.110,473]  bt_mesh_net.net_decrypt: src 0x0b0c [00:00:00.110,565]  bt_mesh_net.bt_mesh_net_decode: Decryption successful. Payload len 25 [00:00:00.110,595]  bt_mesh_net.bt_mesh_net_decode: src 0x0b0c dst 0x0b0c ttl 7 [00:00:00.110,595]  bt_mesh_net.bt_mesh_net_decode: PDU: 37070009010b0c0b0c006f2a263934a~ [00:00:00.110,717]  bt_mesh_cfg_cli: Unexpected Heartbeat Subscription Status message



**Environment (please complete the following information):**
 - OS: MacOS Catalina
 - Toolchain gnuarmemb
 - commit a6a1fb4642fa9ba5e5dcb5174e127a36a47d9ff6 (HEAD, manifest-rev)
LingaoM commented 4 years ago

No such problem, this demo will send mesh message with own unicast address 0x0b0c and group address 0xc000, if you see log above in details, they are two different mesh message origin from local node, Payload len 12: 00880a61ea6968f6901112f3�[0m Payload len 16: 006f2a263934a06ef0446aaf3473384~�[0m they are not possible relayed from local node, just payload different.

Finally, local node send mesh message to group address , this address also subscribed by local node, so, deliver to local queue, also relayed over adv layer.

trond-snekvik commented 4 years ago

@LingaoM's assessment is correct, but I just wanted to provide some context for why this looks confusing in the log:

When you send messages to groups your own node subscribes to, the network layer behaves in an unexpected way. Instead of passing the encrypted message to the advertiser and an unencrypted version to its own transport layer, it just encrypts the packet, and adds it to a local RX queue. It'll then schedule a call to bt_mesh_net_local(), which processes every message in the local RX queue as if it came from the network. This'll pass it to bt_mesh_net_recv -> bt_mesh_net_decode, where it decrypts it, before "relaying" it, like it would for network messages. Note that although the log says that it's relaying its own message, it's actually the first time the message goes on air, as we never added it in the advertiser queue in the first place!

This flow is inefficient and slightly confusing. Your assessment about this being wasteful, particularly on LPN is correct. I have been working on a patch for it, but had to change focus before I got around to submitting it. I'll try to submit it this week.

MarcianoPreciado commented 4 years ago

@LingaoM @trond-snekvik Thank you for educating me on the process. I'm glad you are working on a patch. When I attempt to suspend the mesh just after sending a message, it seems to repeatedly attempt to fill a net buffer with a message but the suspension removed the buffer. It blows my log up.

trond-snekvik commented 4 years ago

Loopback PR: https://github.com/zephyrproject-rtos/zephyr/pull/27796

Sounds like this might log suspension issue might be addressable in a separate PR. Just for confirmation, are you sending mesh messages periodically through a timer of your own, or is it a consequence of some periodic action the stack performs?

MarcianoPreciado commented 4 years ago

@trond-snekvik I am sending according to my own timing.

trond-snekvik commented 4 years ago

@MarcianoPreciado The loopback PR has been merged. Could you check whether the logs look correct with that patch?

carlescufi commented 4 years ago

Closing since there's been no activity for a while. @MarcianoPreciado please reopen if this is not solved.