linux-wpan / wpan-tools

Userspace tools for Linux IEEE 802.15.4 stack
https://linux-wpan.org/
Other
54 stars 28 forks source link

wpan-hwsim unable do delete edge or update lqi #17

Closed zwgraham closed 4 years ago

zwgraham commented 6 years ago

I'm testing the current state of wpan-tools prior to my team beginning some work.

Today, I was testing the 4.19 kernel with the latest wpan-tools and the mac802154_hwsim kernel module. Most everything worked with wpan-hwsim except wpan-hwsim edge del a b and wpan-hwsim edge lqi a b num

Please let me know if this is expected or if there's something I can do to better debug this issue. I'd love to get to the point where I'm submitting patches and adding some device support in the near future

Setup to repeat the issue

# modprobe mac802154_hwsim
# lsmod |grep mac802
mac802154_hwsim        24576  0
mac802154              77824  1 mac802154_hwsim
ieee802154            102400  1 mac802154
# wpan-hwsim add
wpan_hwsim radio2 registered.
# wpan-hwsim edge add 0 2
# wpan-hwsim edge add 2 0
# wpan-hwsim
wpan_hwsim radio0:
    edge:
        radio2
        lqi: 0xff
    edge:
        radio1
        lqi: 0xff
wpan_hwsim radio1:
    edge:
        radio0
        lqi: 0xff
wpan_hwsim radio2:
    edge:
        radio0
        lqi: 0xff

From here you can issue a command to either delete an edge or change a LQI and my virtual machine will freeze. My ssh connection drops, one of the cores is pegged at 100% utilization, and the hypervisor console also won't respond.

What I looked at while troubleshooting

I fired-up GDB to see if I can pick out a problem and took a look at the debug output for libnl3.

GDB

This call in hwsim_cmd_edge(), seems to be where the trouble occurs rc = nl_send_auto(nl_sock, msg);

LIBNL3 Debug

libnl3 debug output for delete edge command

# NLCB=debug wpan-hwsim edge lqi 0 2 250
-- Debug: Sent Message:
--------------------------   BEGIN NETLINK MESSAGE ---------------------------
  [NETLINK HEADER] 16 octets
    .nlmsg_len = 40
    .type = 16 <genl/family::nlctrl>
    .flags = 5 <REQUEST,ACK>
    .seq = 1540347064
    .port = -1891629936
  [GENERIC NETLINK HEADER] 4 octets
    .cmd = 3
    .version = 1
    .unused = 0
  [ATTR 02] 16 octets
    4d 41 43 38 30 32 31 35 34 5f 48 57 53 49 4d 00 MAC802154_HWSIM.
---------------------------  END NETLINK MESSAGE   ---------------------------
-- Debug: Received Message:
--------------------------   BEGIN NETLINK MESSAGE ---------------------------
  [NETLINK HEADER] 16 octets
    .nlmsg_len = 224
    .type = 16 <genl/family::nlctrl>
    .flags = 0 <>
    .seq = 1540347064
    .port = -1891629936
  [GENERIC NETLINK HEADER] 4 octets
    .cmd = 1
    .version = 2
    .unused = 0
  [ATTR 02] 16 octets
    4d 41 43 38 30 32 31 35 34 5f 48 57 53 49 4d 00 MAC802154_HWSIM.
  [ATTR 01] 2 octets
    1e 00                                           ..
  [PADDING] 2 octets
    00 00                                           ..
  [ATTR 03] 4 octets
    01 00 00 00                                     ....
  [ATTR 04] 4 octets
    00 00 00 00                                     ....
  [ATTR 05] 4 octets
    03 00 00 00                                     ....
  [ATTR 06] 120 octets
    14 00 01 00 08 00 01 00 03 00 00 00 08 00 02 00 ................
    1a 00 00 00 14 00 02 00 08 00 01 00 04 00 00 00 ................
    08 00 02 00 1a 00 00 00 14 00 03 00 08 00 01 00 ................
    01 00 00 00 08 00 02 00 0e 00 00 00 14 00 04 00 ................
    08 00 01 00 08 00 00 00 08 00 02 00 1a 00 00 00 ................
    14 00 05 00 08 00 01 00 07 00 00 00 08 00 02 00 ................
    1a 00 00 00 14 00 06 00 08 00 01 00 06 00 00 00 ................
    08 00 02 00 1a 00 00 00                         ........
  [ATTR 07] 24 octets
    18 00 01 00 08 00 02 00 07 00 00 00 0b 00 01 00 ................
    63 6f 6e 66 69 67 00 00                         config..
---------------------------  END NETLINK MESSAGE   ---------------------------
-- Debug: Received Message:
--------------------------   BEGIN NETLINK MESSAGE ---------------------------
  [NETLINK HEADER] 16 octets
    .nlmsg_len = 36
    .type = 2 <ERROR>
    .flags = 256 <ROOT>
    .seq = 1540347064
    .port = -1891629936
  [ERRORMSG] 20 octets
    .error = 0 "Success"
  [ORIGINAL MESSAGE] 16 octets
    .nlmsg_len = 16
    .type = 16 <0x10>
    .flags = 5 <REQUEST,ACK>
    .seq = 1540347064
    .port = -1891629936
---------------------------  END NETLINK MESSAGE   ---------------------------
-- Debug: Sent Message:
--------------------------   BEGIN NETLINK MESSAGE ---------------------------
  [NETLINK HEADER] 16 octets
    .nlmsg_len = 48
    .type = 30 <0x1e>
    .flags = 5 <REQUEST,ACK>
    .seq = 1540347065
    .port = -1891629936
  [GENERIC NETLINK HEADER] 4 octets
    .cmd = 6
    .version = 0
    .unused = 0
  [PAYLOAD] 28 octets
    08 00 01 00 00 00 00 00 14 00 02 00 08 00 01 00 ................
    02 00 00 00 05 00 02 00 fa 00 00 00             ............
---------------------------  END NETLINK MESSAGE   ---------------------------
Timeout, server 192.168.123.40 not responding.

The last line is sshd dropping my connection.

relevant kconfig options

Since I'm using vanilla 4.19, I figure I should share the relevant Kconfig option's I've enabled (obviously the 6lowpan stuff is irrelevant for this particular issue, but I figure I'd include them for completeness.

CONFIG_6LOWPAN=m
# CONFIG_6LOWPAN_DEBUGFS is not set
CONFIG_6LOWPAN_NHC=m
CONFIG_6LOWPAN_NHC_DEST=m
CONFIG_6LOWPAN_NHC_FRAGMENT=m
CONFIG_6LOWPAN_NHC_HOP=m
CONFIG_6LOWPAN_NHC_IPV6=m
CONFIG_6LOWPAN_NHC_MOBILITY=m
CONFIG_6LOWPAN_NHC_ROUTING=m
CONFIG_6LOWPAN_NHC_UDP=m
# CONFIG_6LOWPAN_GHC_EXT_HDR_HOP is not set
# CONFIG_6LOWPAN_GHC_UDP is not set
# CONFIG_6LOWPAN_GHC_ICMPV6 is not set
# CONFIG_6LOWPAN_GHC_EXT_HDR_DEST is not set
# CONFIG_6LOWPAN_GHC_EXT_HDR_FRAG is not set
# CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE is not set
CONFIG_IEEE802154=m
CONFIG_IEEE802154_NL802154_EXPERIMENTAL=y
CONFIG_IEEE802154_SOCKET=m
CONFIG_IEEE802154_6LOWPAN=m
CONFIG_MAC802154=m
--------------------------------------
CONFIG_IEEE802154_DRIVERS=m
CONFIG_IEEE802154_FAKELB=m
CONFIG_IEEE802154_AT86RF230=m
CONFIG_IEEE802154_AT86RF230_DEBUGFS=y
CONFIG_IEEE802154_MRF24J40=m
CONFIG_IEEE802154_CC2520=m
CONFIG_IEEE802154_ATUSB=m
CONFIG_IEEE802154_ADF7242=m
CONFIG_IEEE802154_CA8210=m
CONFIG_IEEE802154_CA8210_DEBUGFS=y
# CONFIG_IEEE802154_MCR20A is not set
CONFIG_IEEE802154_HWSIM=m
alexaring commented 6 years ago

one of the cores is pegged at 100%

This sounds like it somehow stuck inside softirq... maybe some result in a endless loop. :-/

alexaring commented 6 years ago

https://www.spinics.net/lists/linux-wpan/msg05331.html

Please look if this fix your issue. Thanks and sorry.

Stefan-Schmidt commented 5 years ago

@zwgraham it would be great if you could confirm the fix from Alex. I applied it to the wpan tree by now in case you prefer fetching it from a git repo (https://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan.git/)

alexaring commented 4 years ago

I close this as it solved for me.