zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.82k stars 6.6k forks source link

nRF52840 nRF52833 Bluetooth: Timeout in `net_config_init_by_iface` but interface is up #54856

Closed bamkrs closed 1 year ago

bamkrs commented 1 year ago

Describe the bug I'm currently trying to build the bluetooth/ipsp sample and have run into a delay of 30 seconds when booting zephyr:

*** Booting Zephyr OS build v3.2.99-ncs1 ***                                                                             
[00:00:00.006,439] <inf> sdc_hci_driver: hci_driver_open: SoftDevice Controller build revision:                          
                                         6d 90 41 2a 38 e8 ad 17  29 a5 03 38 39 27 d7 85 |m.A*8... )..89'..             
                                         1f 85 d8 e1                                      |....                          
[00:00:00.008,941] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.008,972] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:00.009,002] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 109.16784 Build 2917677098
[00:00:00.010,253] <inf> bt_hci_core: bt_dev_show_info: Identity: EA:6C:8A:46:50:5E (random)
[00:00:00.010,284] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x11fa, manufacturer 0x0059
[00:00:00.010,345] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0x11fa
[00:00:30.011,871] <inf> ipsp: init_app: Run IPSP sample
[00:00:30.012,939] <inf> ipsp: listen: Starting to wait

It all boils down to CONFIG_NET_CONFIG_INIT_TIMEOUT=30 and the wait in net_config_init_by_iface located in subsys/net/lib/config/init.c. The semaphore is seemingly never given, thus net_config_init_by_iface returns -ENETDOWN after a delay of 30 seconds before the application is started.

However, it seems like the BT interface is up and running happily since it immediately pops up in my bluetoothctl

# scan le
Discovery started
[CHG] Controller 64:D6:9A:DD:D0:73 Discovering: yes
[NEW] Device EA:6C:8A:46:50:5E Test IPSP node

I've tested the sample an nRF52840-DK (2.0.1) and an nRF52833-DK (1.0.0) both showing the same behavior.

To Reproduce Steps to reproduce the behavior:

nRF52840:

zephyr/samples/bluetooth/ipsp (git)-[tags/v3.2.99-ncs1] % west build --pristine -b nrf52840_nrf52840dk --build-dir build/nrf52840dk_nrf52840/ipsp
zephyr/samples/bluetooth/ipsp (git)-[tags/v3.2.99-ncs1] % west flash --build-dir build/nrf52840dk_nrf52840/ipsp

nRF52833:

zephyr/samples/bluetooth/ipsp (git)-[tags/v3.2.99-ncs1] % west build --pristine -b nrf52833_nrf52833dk --build-dir build/nrf52833dk_nrf52833/ipsp
zephyr/samples/bluetooth/ipsp (git)-[tags/v3.2.99-ncs1] % west flash --build-dir build/nrf52833dk_nrf52833/ipsp

Expected behavior Since the interface is actively advertising itself, the interface seems to be up, but Zephyr does detect the interface beeing up. I would expect Zephyr to detect the interface being up and set it up properly instead of timing out.

Impact I'm currently unable to set up a prototype for a new product we need to pitch our CEO.

Logs and console output Network-Log Enabled:

[00:00:00.000,732] <dbg> net_bt: net_bt_init: (main): dev 0x5e310 driver_data 0x20000720                                                                                                      
[00:00:00.002,929] <dbg> net_if: net_if_init: (main):                                                                                                                                                              
[00:00:00.002,960] <dbg> net_if: init_iface: (main): On iface 0x20001c50                                                                                                                                           
[00:00:00.002,990] <dbg> net_bt: bt_iface_init: (main): iface 0x20001c50                                                                                                                                           
[00:00:00.003,051] <dbg> net_bt: bt_iface_init: (main): [0] alloc ctxt 0x20000720 iface 0x20001c50                                                                                                                 
[00:00:00.003,173] <dbg> net_if: update_operational_state: (main): iface 0x20001c50, oper state DOWN admin DOWN carrier ON dormant ON                                                                              
[00:00:00.003,204] <dbg> net_if: init_iface: (main): On iface 0x20001c58                                                                                                                                           
[00:00:00.003,234] <dbg> net_bt: bt_iface_init: (main): iface 0x20001c58                                                                                                                                           
[00:00:00.003,265] <dbg> net_bt: bt_iface_init: (main): [1] alloc ctxt 0x20000720 iface 0x20001c58                                                                                                                 
*** Booting Zephyr OS build v3.2.99-ncs1 ***                                                                                                                                                                       
[00:00:30.012,451] <err> net_config: net_config_init_by_iface: Timeout while waiting network interface                                                                                                             
[00:00:30.012,481] <err> net_config: net_config_init_app: Network initialization failed (-115)

Environment (please complete the following information):

nordicjm commented 1 year ago

Have you tested this and reproduced this with zephyr? If it is not present in zephyr, this issue should be raised on https://devzone.nordicsemi.com/ and not here

bamkrs commented 1 year ago

@nordicjm Bug is reproducible in latest vanilla Zephyr:

*** Booting Zephyr OS build v3.3.0-rc3-38-gc9225e4365b9 ***
[00:00:00.396,392] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.396,423] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:00.396,453] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 3.3 Build 0
[00:00:00.397,430] <inf> bt_hci_core: bt_dev_show_info: Identity: EA:6C:8A:46:50:5E (random)
[00:00:00.397,460] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1
[00:00:00.397,491] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0xffff
[00:00:30.398,773] <inf> ipsp: init_app: Run IPSP sample
[00:00:30.399,291] <inf> ipsp: listen: Starting to wait

I've build this image in Zephyrs official docker container.

kruithofa commented 1 year ago

@rlubos can you please have a look at this?

rlubos commented 1 year ago

Do you establish connection with the Zephyr inteface? If not, the interface cannot be considered as operationally up, and that's what Zephyr is waiting for. You can't consider an interface as operational up, if you can't send packets over it.

bamkrs commented 1 year ago

Do you mean to connect via - in this case - bluetooth with another device?

rlubos commented 1 year ago

Do you mean to connect via - in this case - bluetooth with another device?

Yes, I mean Bluetooth connection with some host.

bamkrs commented 1 year ago

Ok, I've spun up the example, fired up bluetoothctl, searched and connected to the device, The function still reports an timeout.

[NEW] Device D7:B9:F3:DE:18:A3 Zephyr
[bluetooth]# pair D7:B9:F3:DE:18:A3
Attempting to pair with D7:B9:F3:DE:18:A3
[CHG] Device D7:B9:F3:DE:18:A3 Connected: yes
[CHG] Device D7:B9:F3:DE:18:A3 Bonded: yes
[CHG] Device D7:B9:F3:DE:18:A3 UUIDs: 00001800-0000-1000-8000-00805f9b34fb
[CHG] Device D7:B9:F3:DE:18:A3 UUIDs: 00001801-0000-1000-8000-00805f9b34fb
[CHG] Device D7:B9:F3:DE:18:A3 UUIDs: 00001820-0000-1000-8000-00805f9b34fb
[CHG] Device D7:B9:F3:DE:18:A3 ServicesResolved: yes
[CHG] Device D7:B9:F3:DE:18:A3 Paired: yes
[NEW] Primary Service (Handle 0x0000)
    /org/bluez/hci0/dev_D7_B9_F3_DE_18_A3/service0001
    00001801-0000-1000-8000-00805f9b34fb
    Generic Attribute Profile
[NEW] Characteristic (Handle 0x0000)
    /org/bluez/hci0/dev_D7_B9_F3_DE_18_A3/service0001/char0002
    00002a05-0000-1000-8000-00805f9b34fb
    Service Changed
[NEW] Descriptor (Handle 0x0000)
    /org/bluez/hci0/dev_D7_B9_F3_DE_18_A3/service0001/char0002/desc0004
    00002902-0000-1000-8000-00805f9b34fb
    Client Characteristic Configuration
[NEW] Characteristic (Handle 0x0000)
    /org/bluez/hci0/dev_D7_B9_F3_DE_18_A3/service0001/char0005
    00002b29-0000-1000-8000-00805f9b34fb
    Client Supported Features
[NEW] Characteristic (Handle 0x0000)
    /org/bluez/hci0/dev_D7_B9_F3_DE_18_A3/service0001/char0007
    00002b2a-0000-1000-8000-00805f9b34fb
    Database Hash
[NEW] Primary Service (Handle 0x0000)
    /org/bluez/hci0/dev_D7_B9_F3_DE_18_A3/service0010
    00001820-0000-1000-8000-00805f9b34fb
    Internet Protocol Support
Pairing successful
[Zephyr]# connect D7:B9:F3:DE:18:A3
Attempting to connect to D7:B9:F3:DE:18:A3
Connection successful

However, the output slighlty changed, I receive a warning about some opcode when pairing.

[00:00:11.328,002] <wrn> bt_hci_core: bt_hci_cmd_send_sync: opcode 0x200a status 0x09
[00:00:30.012,420] <err> net_config: net_config_init_by_iface: Timeout while waiting network interface                                                                                                                                                                                                                                                                                                                                  
[00:00:30.012,420] <err> net_config: net_config_init_app: Network initialization failed (-115)
rlubos commented 1 year ago

Again, was it tested in NCS or Zephyr context? There was a bug, already fixed in Zephyr, that impacted interface state notification (https://github.com/zephyrproject-rtos/zephyr/pull/52934).

bamkrs commented 1 year ago

Same:

[NEW] Device EA:6C:8A:46:50:5E Test IPSP node
[bluetooth]# pair EA:6C:8A:46:50:5E
Attempting to pair with EA:6C:8A:46:50:5E
[CHG] Device EA:6C:8A:46:50:5E Connected: yes
[CHG] Device EA:6C:8A:46:50:5E Bonded: yes
[CHG] Device EA:6C:8A:46:50:5E UUIDs: 00001800-0000-1000-8000-00805f9b34fb
[CHG] Device EA:6C:8A:46:50:5E UUIDs: 00001801-0000-1000-8000-00805f9b34fb
[CHG] Device EA:6C:8A:46:50:5E UUIDs: 00001820-0000-1000-8000-00805f9b34fb
[CHG] Device EA:6C:8A:46:50:5E ServicesResolved: yes
[CHG] Device EA:6C:8A:46:50:5E Paired: yes
[NEW] Primary Service (Handle 0x0000)
    /org/bluez/hci0/dev_EA_6C_8A_46_50_5E/service0001
    00001801-0000-1000-8000-00805f9b34fb
    Generic Attribute Profile
[NEW] Characteristic (Handle 0x0000)
    /org/bluez/hci0/dev_EA_6C_8A_46_50_5E/service0001/char0002
    00002a05-0000-1000-8000-00805f9b34fb
    Service Changed
[NEW] Descriptor (Handle 0x0000)
    /org/bluez/hci0/dev_EA_6C_8A_46_50_5E/service0001/char0002/desc0004
    00002902-0000-1000-8000-00805f9b34fb
    Client Characteristic Configuration
[NEW] Characteristic (Handle 0x0000)
    /org/bluez/hci0/dev_EA_6C_8A_46_50_5E/service0001/char0005
    00002b29-0000-1000-8000-00805f9b34fb
    Client Supported Features
[NEW] Characteristic (Handle 0x0000)
    /org/bluez/hci0/dev_EA_6C_8A_46_50_5E/service0001/char0007
    00002b2a-0000-1000-8000-00805f9b34fb
    Database Hash
[NEW] Primary Service (Handle 0x0000)
    /org/bluez/hci0/dev_EA_6C_8A_46_50_5E/service0010
    00001820-0000-1000-8000-00805f9b34fb
    Internet Protocol Support
Pairing successful
[Test IPSP node]# connect EA:6C:8A:46:50:5E
Attempting to connect to EA:6C:8A:46:50:5E
Connection successful

I've connected roughly 10 secs after resetting the device.

*** Booting Zephyr OS build v3.3.0-rc3-38-gc9225e4365b9 ***
[00:00:00.256,988] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.257,019] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:00.257,049] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 3.3 Build 0
[00:00:00.258,026] <inf> bt_hci_core: bt_dev_show_info: Identity: EA:6C:8A:46:50:5E (random)
[00:00:00.258,056] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1
[00:00:00.258,087] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0xffff
[00:00:30.259,368] <inf> ipsp: init_app: Run IPSP sample
[00:00:30.259,887] <inf> ipsp: listen: Starting to wait
rlubos commented 1 year ago

Sorry, I can't reproduce this particular problem, the inteface is brought up as soon as I connect on my side:

*** Booting Zephyr OS build v3.3.0-rc3-60-gd1ec6f7aaf5c ***
[00:00:00.254,730] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.254,760] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:00.254,791] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version0
[00:00:00.255,737] <inf> bt_hci_core: bt_dev_show_info: Identity: DF:7D:FC:28:D5:26 (random)
[00:00:00.255,798] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x0000, manufact1
[00:00:00.255,828] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0xffff
[00:00:07.767,181] <wrn> bt_l2cap: l2cap_chan_le_send: No credits to transmit packet
[00:00:07.767,272] <inf> ipsp: init_app: Run IPSP sample
[00:00:07.767,730] <inf> ipsp: listen: Starting to wait

I do however follow the https://github.com/zephyrproject-rtos/zephyr/blob/main//samples/bluetooth/ipsp/README.rst wrt to connection method, maybe that's the case.

Unfortunately though, the sample crashed just after establishing connection:

[00:00:07.915,679] <err> os: bus_fault: ***** BUS FAULT *****
[00:00:07.915,710] <err> os: bus_fault:   Imprecise data bus error
[00:00:07.915,710] <err> os: esf_dump: r0/a1:  0xfffffffc  r1/a2:  0x0000000c  r2/a3:  0x00000008
[00:00:07.915,740] <err> os: esf_dump: r3/a4:  0x00000008 r12/ip:  0x20000708 r14/lr:  0x0000969d
[00:00:07.915,740] <err> os: esf_dump:  xpsr:  0x21000000
[00:00:07.915,771] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x000096ae
[00:00:07.915,802] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 26: Unknown error on CPU 0
[00:00:07.915,832] <err> os: z_fatal_error: Current thread: 0x20001968 (unknown)
[00:00:08.517,181] <err> os: k_sys_fatal_error_handler: Halting system

But that's likely related to another issue we have open (https://github.com/zephyrproject-rtos/zephyr/issues/53871#issuecomment-1433011723)

bamkrs commented 1 year ago

So I closely followed the README but had no luck.

As host device, I switched to an up2date RPi3B 1.2 and tried both the internal Bluetooth controller as well as an nRF52840DK running the hci_usb sample connected via USB. The interface won't come up. So I dug a bit deeper and found litte to nothing.

root@raspberrypi:~# echo 1 > /sys/kernel/debug/bluetooth/6lowpan_enable
root@raspberrypi:~# echo "connect D7:B9:F3:DE:18:A3 2" > /sys/kernel/debug/bluetooth/6lowpan_control

Running this yields an bt0 on my host device which is up. However, Zephyr still regards its interface as down. So I hooked up bt-connection-callbacks with some logging:

static void connected(struct bt_conn *conn, uint8_t err)
{
    if (err) {
        LOG_ERR("Connection failed (err 0x%02x)", err);
    } else {
        LOG_INF("Connected");
    }
    // [...] Setting RGB LED to Blue here
}

static void disconnected(struct bt_conn *conn, uint8_t reason)
{
    LOG_INF("Disconnected (reason 0x%02x)", reason);
    // [...] Setting RGB LED to green here
}

static void security_changed(struct bt_conn *conn, bt_security_t level,
                             enum bt_security_err err)
{
    char addr[BT_ADDR_LE_STR_LEN];
    bt_addr_le_to_str(bt_conn_get_dst(conn), addr, sizeof(addr));
    if (!err) {
        printk("Security changed: %s level %u\n", addr, level);
    } else {
        printk("Security failed: %s level %u err %d\n", addr, level,
               err);
    }
}

BT_CONN_CB_DEFINE(conn_callbacks) = {
    .connected = connected,
    .disconnected = disconnected,
    .security_changed = security_changed
};

As soon as I connect D7:B9:F3:DE:18:A3 2 I can see

[00:00:15.839,599] <inf> bt: connected: Connected

But thats' it. I still get the timeout:

[00:00:30.840,972] <err> net_config: net_config_init_by_iface: Timeout while waiting network interface
[00:00:30.841,003] <err> net_config: net_config_init_app: Network initialization failed (-115)

Now I've added an RTT shell with network support which shows me two interfaces, both down even tho I am connected and have not received an disconnect callback:

rtt:~$ net iface

Interface 0x20002714 (Bluetooth) [1]                                                                                                                                                                               
====================================                                                                                                                                                                               
Interface is down.                                                                                                                                                                                                 

Interface 0x2000271c (Bluetooth) [2]                                                                                                                                                                               
====================================                                                                                                                                                                               
Interface is down.

How is it possible that I'm connected to the device, have not received an disconnect call, but the interface is still down? I've tested it on both main (1 hour ago) and the latest version of the nRF fork.

cfriedt commented 1 year ago

@vaishnavachath - just wanted to CC you here. There is currently a lack of maintainership in Zephyr + Linux for IPSP and talk of deprecation for BLE IPSP, but from what I recall, BB.O might want to ensure that it stays in. Maybe chat with Jason and see what they would like to do.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

henryzhang62 commented 1 year ago

I use PCA10056 (nRF52840) and build: west build -b nrf52840dk_nrf52840 samples/bluetooth/ipsp -- -DCONF_FILE=prj_dbg.conf In serial console (/dev/ttyACM0). Get Booting Zephyr OS build v3.4.0-rc1-198-g0ae7812f6bdb [00:00:00.379,943] bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002) [00:00:00.379,974] bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002) [00:00:00.380,004] bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 30 [00:00:00.381,011] bt_hci_core: bt_dev_show_info: Identity: D6:BC:19:F7:53:94 (random) [00:00:00.381,042] bt_hci_core: bt_dev_show_info: HCI: version 5.4 (0x0d) revision 0x0000, manufactur1 [00:00:00.381,072] bt_hci_core: bt_dev_show_info: LMP: version 5.4 (0x0d) subver 0xffff [00:00:00.381,622] net_config: net_config_init_by_iface: Initializing network [00:00:00.381,652] net_config: check_interface: Waiting interface 1 (0x20001934) to be up... [00:00:09.532,836] net_config: iface_up_handler: Interface 1 (0x20001934) coming up [00:00:09.533,508] ipsp: init_app: Run IPSP sample [00:00:09.534,027] ipsp: listen: Starting to wait [00:00:09.633,636] net_config: ipv6_event_handler: IPv6 address: 2001:db8::1

Step is:

  1. west flash

In Ubuntu host sudo modprobe bluetooth_6lowpan sudo bash -c "echo 1 > /sys/kernel/debug/bluetooth/6lowpan_enable" sudo bash -c "cat /sys/kernel/debug/bluetooth/6lowpan_enable" bluetoothctl trust D6:BC:19:F7:53:94 sudo bash -c "echo connect D6:BC:19:F7:53:94 2 > /sys/kernel/debug/bluetooth/6lowpan_control"

Interface is up and ping6 goes through.

jori-nordic commented 1 year ago

good to hear. I'll close the issue then. @bamkrs feel free to reopen if you still experience it.

@henryzhang62 it still seems that we get a 9-second delay for the interface to be up though, wonder what that is. Still better than the 30 seconds.

henryzhang62 commented 1 year ago

[00:00:00.381,652] net_config: check_interface: Waiting interface 1 (0x20001934) to be up... And then after 9 seconds, it receives connection message sent from the host: sudo bash -c "echo connect D6:BC:19:F7:53:94 2 > /sys/kernel/debug/bluetooth/6lowpan_control" then this message is triggered. [00:00:09.532,836] net_config: iface_up_handler: Interface 1 (0x20001934) coming up

The interval time between above two messages are decided by user how soon he will trigger the connection command from the host.

jori-nordic commented 1 year ago

Ah this makes sense, thanks for checking