kyberpunk / openthread-mqttsn

This repository contains examples using MQTT-SN client implementation for Thread network based on OpenThread SDK.
BSD 3-Clause "New" or "Revised" License
30 stars 4 forks source link

Can't get gateway advertising working #25

Open ajlennon opened 1 year ago

ajlennon commented 1 year ago

I've been having all sorts of problems trying to get the gateway advertising example working. I suspect I am doing somethng wrong here but I can't for the life of me work out what it is.

I can build and run the example - I've added a WIP version into my build here

https://github.com/DynamicDevices/openthread/blob/ajl/adding-examples/examples/apps/mqtt-snsearchgw/main.c

Now when I run this up I can see the SEARCHGW coming into the UDP6 MQTT-SNGATEWAY via the OTBR. I see the application sending the response but I never get that back into the application.

I've looked into PAHO and found some weird things. There's a bug which prevents it setting the remote multicast port correctly which I have fixed here (although I've not really fixed it properly - it should use the string ctor)

https://github.com/DynamicDevices/paho.mqtt-sn.embedded-c/commit/db235f76b5fe7ac056f78a48f7aafbcd96fecba6

I thought multicast still used ports so I am not sure how this would have ever worked?

I should say I can ping the multicast IPv6 address and I get a response from the CLI

With debugging on the CLI the packets I get are a bit odd as they involve "hops". Not sure if these are the data UDP packets or something else

Any help or signposting would be much appreciated !!!

ajlennon commented 1 year ago

Here's a log

Mqtt----------: Set Search GW
Mqtt----------: Sending message to ff03:0:0:0:0:0:0:1[:10000]
Done
> Mac-----------: Request to start operation "TransmitDataDirect"
Mac-----------: Starting operation "TransmitDataDirect"
SubMac--------: RadioState: Receive -> CsmaBackoff
SubMac--------: RadioState: CsmaBackoff -> Transmit

Mac-----------: ==============================[TX len=054]==============================
Mac-----------: | 49 98 A1 44 44 FF FF 00 | 34 0D D6 07 00 00 01 1F | I.!DD...4.V.....
Mac-----------: | B9 B6 B8 33 E6 48 EE 44 | 2A 9A 22 E7 9E B3 8F 6A | 9683fHnD*."g.3.j
Mac-----------: | 6C 1F 35 62 FB F0 1D 71 | 28 A6 A0 C7 FD 26 26 99 | l.5b{p.q(& G}&&.
Mac-----------: | 7E A7 61 5A F7 45 .. .. | .. .. .. .. .. .. .. .. | ~'aZwE..........
Mac-----------: ------------------------------------------------------------------------
Mac-----------: Finishing operation "TransmitDataDirect"
MeshForwarder-: Sent IPv6 HopOpts msg, len:59, chksum:0000, ecn:no, to:0xffff, sec:yes, priol
MeshForwarder-:     src:[fdb8:d848:8bc4:12b6:5e8e:2ca5:4e68:e32c]
MeshForwarder-:     dst:[ff03:0:0:0:0:0:0:1]
Mac-----------: Idle mode: Radio receiving on channel 15
Mac-----------: Request to start operation "TransmitDataDirect"
Mac-----------: Starting operation "TransmitDataDirect"
SubMac--------: RadioState: Receive -> CsmaBackoff
SubMac--------: RadioState: CsmaBackoff -> Transmit
SubMac--------: RadioState: Transmit -> Receive
Mac-----------: ==============================[TX len=054]==============================
Mac-----------: | 49 98 A2 44 44 FF FF 00 | 34 0D D7 07 00 00 01 BE | I."DD...4.W....>
Mac-----------: | 55 24 86 EE 76 60 6D D6 | E2 6D 53 EC 62 D2 8B 7B | U$.nvmVbmSlbR.{
Mac-----------: | F3 5C 58 2E 62 47 68 36 | 19 CB 5A 85 2C 39 3A A3 | s\X.bGh6.KZ.,9:#
Mac-----------: | 31 EB 07 4B F7 45 .. .. | .. .. .. .. .. .. .. .. | 1k.KwE..........
Mac-----------: ------------------------------------------------------------------------
Mac-----------: Finishing operation "TransmitDataDirect"
MeshForwarder-: Sent IPv6 HopOpts msg, len:59, chksum:0000, ecn:no, to:0xffff, sec:yes, priol
MeshForwarder-:     src:[fdb8:d848:8bc4:12b6:5e8e:2ca5:4e68:e32c]
MeshForwarder-:     dst:[ff03:0:0:0:0:0:0:1]
Mac-----------: Idle mode: Radio receiving on channel 15
Mac-----------: Received frame from short address 0x5c00
Mac-----------: Rx security - frame counter 14
Mac-----------: ==============================[RX len=054]=============================
Mac-----------: | 49 98 46 44 44 FF FF 00 | 5C 0D 0E 00 00 00 01 7C | I.FDD...\......|
Mac-----------: | 5A 02 5E E 2C A5 4E 68 | E3 2C 03 00 00 01 E1 06 | Z.^.,%Nhc,....a.
Mac-----------: | 6D 04 40 03 34 00 F0 27 | 10 27 10 7B 6D 03 01 03 | m.@.4.p'.'.{m...
Mac-----------: | 13 3F 8C 2A 2A 48 .. .. | .. .. .. .. .. .. .. .. | .?.**H..........
Mac-----------: ----0000, ecn:no, from:0x5c00, sec:yes, prio:normal, rss:-50.0
MeshForwarder-:     src:[fdb8:d848:8bc4:12b6:5e8e:2ca5:4e68:e32c]
MeshForwarder-:     dst:[ff03:0:0:0:0:0:0:1]
Mac-----------: Idle mode: Radio receiving on channel 15
MeshForwarder-: Received IPv6 HopOpts msg, len:59, chksum:0000, ecn:no, from:0x5c00, sec:yes0
MeshForwarder-:     src:[fdb8:d848:8bc4:12b6:5e8e:2ca5:4e68:e32c]
MeshForwarder-:     dst:[ff03:0:0:0:0:0:0:1]
Mac-----------: Idle mode: Radio receiving on channel 15
Mle-----------: network id timeout = 13
Mle-----------: network id timeout = 14Mle-----------: network id timeout = 16Mle-----------7
Mle-----------: network id timeout = 19Mle-----------: network id timeout = 21
Mac-----------: ==============================[RX len=070]==============================C. -0
Mle-----------: network id timeout = 1
Mle-----------: network id timeout = 5
Mle-----------: network id timeout = 6
Mle-----------: network id timeout = 8
Mle-----------: network id timeout = 9
Mle-----------: network id timeout = 11
Mac-----------: ==============================[RX len=070]==============================
Mac-----------: | 41 D8 49 44 44 FF FF E8 | 0A 77 84 0E 16 2E EA 7F | AXIDD..h.w....j.
Mac-----------: | 3B 01 F0 4D 4C 4D 4C:~.._.V`[
Mac-----------: | 19 DF B1 25 96 1E AA EE | 0A 55 C0 A0 7B 72 83 DA | ._1%..*n.U@ {r.Z
Mac-----------: | 31 B9 1A 4C 29 F6 .. .. | .. .. .. .. .. .. .. .. | 19.L)v..........
Mac-----------: ------------------------------------------------------le-----------: Receivee
Mle-----------: Receive Advertisement (fe80:0:0:0:e82e:160e:8477:ae8,0x5c00)
Mac-----------: Idle mode: Radio receiving on channel 15
Mle-----------: network id timeout = 0
Mle-----------: network id timeout = 1
Mle-----------: network id timeout = 3Mle-----------: network id timeout = 4Mle-----------: 5
Mle-----------: network id timeout = 6
Mle-----------: network id timeout = 7
Mle-----------: Send Advertisement (ff02:0:0:0:0:0:0:1)
Mac-----------: Request to start operation "TransmitDataDirect"
Mac-----------: Starting operation "TransmitDataDirect"
SubMac--------: RadioState: Receive -> CsmaBackoff
SubMac--------: RadioState: CsmaBackoff -> Transmit
SubMac--------: RadioState: Transmit -> Receive
Mac-----------: ==============================[TX len=070]====================

You can see I get this back but it is "HopOps" and it doesn't seem to log a port with the multicast destination

MeshForwarder-: Received IPv6 HopOpts msg, len:59, chksum:0000, ecn:no, from:0x5c00, sec:yes0
MeshForwarder-:     src:[fdb8:d848:8bc4:12b6:5e8e:2ca5:4e68:e32c]
MeshForwarder-:     dst:[ff03:0:0:0:0:0:0:1]
ajlennon commented 1 year ago

Looking at Wireshark logs I think this shows the SEARCHGW going out but I am not getting any response back from the OTBR

image

ajlennon commented 1 year ago

OK some progress. I see from your paho6 docker container you're using different code from the current PAHO MQTTSNGateway code

https://github.com/eclipse/paho.mqtt-sn.embedded-c/blob/master/MQTTSNGateway/src/linux/udp6/SensorNetwork.cpp

https://github.com/kyberpunk/paho.mqtt-sn.embedded-c/blob/master/MQTTSNGateway/src/linux/udp6/SensorNetwork.cpp

Looks like there's been some kind of rewrite going on. With your fork [master] I see advertisements on the CLI but I can't see SEARCHGW on the OTBR MQTT-SNGateway. With the upstream PAHO code I see SEARCHGW but get no ADVERTISEMENT or any response !

kyberpunk commented 1 year ago

Hi, thank you for debugging. I will try to test it. The issues I've experienced in this case were mainly related to routing. It was working well in case when paho gateway was attached directly to OTBR interface (e.g. by using the same network in Docker). When there were some other network nodes like router or virtual bridges, there were problems with routing setup.

ajlennon commented 1 year ago

No worries - I hope some of what I've been doing is helpful. I've been down ALL sorts of rabbit holes but I now have a really nice setup for Balena with an app I can build in the new nRF Connect VSCode IDE, it automatically connects to the mesh, finds the advertised gateway, and starts publishing.

I'm trying to put a "block" together based on your work for people to use within the Balena ecosystem

https://github.com/DynamicDevices/openthread-border-router-block

Currently I have the MQTT-SNGateway and the OTBR inside the same container as I've had trouble routing the networking between containers, but I think now for whatever reason this was the PAHO issues. So now I'm just trying to work out how to get that networking sorted out.

Not sure why it's worked for you but not for me, but as I say I hope some of the work I've been doing here is useful :)

kyberpunk commented 11 months ago

Hi @ajlennon sorry for late response. I was trying to setup new OTBR and test advertising with newer paho code. Actually in my setup multicast messages from gateway were routed to wrong interface wlan0 instead of wpan0. Once I've disabled ipv6 on wlan0, it started working and GWINFO or ADVERTISE was received correctly by client.

Screenshot 2023-10-26 234853

Screenshot 2023-10-26 235258

kyberpunk commented 11 months ago

What helped me was also adding route for multicast addresses: sudo ip route add multicast ff00::/8 dev wpan0 table local metric 100

ajlennon commented 11 months ago

Hi @ajlennon sorry for late response. I was trying to setup new OTBR and test advertising with newer paho code. Actually in my setup multicast messages from gateway were routed to wrong interface wlan0 instead of wpan0. Once I've disabled ipv6 on wlan0, it started working and GWINFO or ADVERTISE was received correctly by client.

So this is really interesting. Thanks for letting me know @kyberpunk. Can you confirm for me whether the PAHO code is in the same container or a different container to the OTBR code? I have everything working fine with MQTT-SN gateway and OTBR in same container but I can't seem to bridge m/cast packets between two containers which I would love to be able to do.

Thanks!

kyberpunk commented 11 months ago

@ajlennon I'd deploying it to different container (can try my build kyberpunk/paho:1.0.3-udp6) but setting the same network as otbr (--net "container:otbr"). This is the only way I was able to make adrvertising working out-of-the box. Unfortunately I don't know how to handle IPv6 multicast between different network nodes. Theoretically would make sense to replace MQTT-SN SEARCHGW feature by service discovery (https://github.com/openthread/openthread/blob/main/src/cli/README_SRP.md) or simply use DNS.

ajlennon commented 11 months ago

Yeah so that's interesting. I am imagining I could get things going this way too with everything sitting on the host network. But I really wanted to try to leverage containers to keep everything isolated. For security apart from anything else. From my reading people seem to think it's actually not possible at this time to do IP6 multicast between containers but I don't know :shrug: