Closed andrekir closed 2 years ago
rebroadcasting only with full channel settings:
03:10:25 10 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=21, time 616 ms
03:10:25 10 [RadioIf] Lora RX (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6 rxSNR=1.73219)
03:10:25 10 [RadioIf] AirTime - Packet received : 616ms
03:10:26 10 [Router] Add packet record (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6 rxSNR=1.73219)
03:10:26 10 [Router] Using channel 0 (hash 0x8)
03:10:26 10 [Router] Expanding short PSK #1
03:10:26 10 [Router] Installing AES128 key!
03:10:26 10 [Router] decoded message (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x0 Portnum=1 rxtime=1650337826 rxSNR=6 rxSNR=1.73219)
03:10:26 10 [Router] handleReceived(REMOTE) (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x0 Portnum=1 rxtime=1650337826 rxSNR=6 rxSNR=1.73219)
03:10:26 10 [Router] Module 'text' wantsPacket=1
03:10:26 10 [Router] Received text msg from=0x5a4989b0, id=0xa1c5c019, msg=2
03:10:26 10 [Router] Module 'text' considered
03:10:26 10 [Router] Module 'canned' wantsPacket=1
03:10:26 10 [Router] Module 'canned' considered
03:10:26 10 [Router] Module 'ExternalNotificationModule' wantsPacket=1
03:10:26 10 [Router] External Notification Module Disabled
03:10:26 10 [Router] Module 'ExternalNotificationModule' considered
03:10:26 10 [Router] Module 'StoreForwardModule' wantsPacket=1
03:10:26 10 [Router] Store & Forward Module - Disabled
03:10:26 10 [Router] Module 'StoreForwardModule' considered
03:10:26 10 [Router] Module 'routing' wantsPacket=1
03:10:26 10 [Router] Received routing from=0x5a4989b0, id=0xa1c5c019, portnum=1, payloadlen=1
03:10:26 10 [Router] Routing sniffing (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x0 Portnum=1 rxtime=1650337826 rxSNR=6 rxSNR=1.73219)
03:10:26 10 [Router] Rebroadcasting received floodmsg to neighbors (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x0 Portnum=1 rxtime=1650337826 rxSNR=6 rxSNR=1.73219)
This is occurring on 1.2.64 as well.
I spent some time the other week trying to hunt this down (maybe too much lol), I'm not a programmer so this may not be very helpful or productive, but I suspect that when no suitable channel/PSK/module is found, the message is marked as being seen, and once it goes back into the process of checking whether it should be rebroadcast, it isn't because it's already been seen, but this doesn't make sense for this scenario when the packet should be forwarded despite being seen, as it may belong to a radio or radio group on a different PSK. I wasn't able to isolate the code that could be causing this, but I know it's a combination of Router, FloodingRouter, and ReliableRouter, perhaps an issue with the sniffing feature.
Again, I could be totally off, I'm not a programmer so apologies if this isn't helpful.
This issue has been mentioned on Meshtastic. There might be relevant details there:
https://meshtastic.discourse.group/t/do-you-plan-on-going-to-burning-man-2022/4170/27
Hmm. I spent a dayish looking into this. I think I found the problem:
'sniffReceived' is a hook that routers can use to do early processing on received packets. In particular, the FloodingRouter mixin (which provides our mesh flood routing behavior) overrode this hook to do its retransmissions (to forward packets for other nodes).
The old code called sniffReceived here:
bool RoutingPlugin::handleReceivedProtobuf(const MeshPacket &mp, const Routing *r)
{
printPacket("Routing sniffing", &mp);
router->sniffReceived(&mp, r);
...
An important guarantee of the handleReceivedProtobuf 'contract' was that the decoded r parameter will be provided only if it was able to be decoded (i.e.e it could be null in some cases, and that point any of these hooks should treat it as opaque though they can still use the 'envelope' data in p).
And its implementation was:
void FloodingRouter::sniffReceived(const MeshPacket *p, const Routing *c)
{
// If a broadcast, possibly _also_ send copies out into the mesh.
// (FIXME, do something smarter than naive flooding here)
if (p->to == NODENUM_BROADCAST && p->hop_limit > 0 && getFrom(p) != getNodeNum()) {
if (p->id != 0) {
MeshPacket *tosend = packetPool.allocCopy(*p); // keep a copy because we will be sending it
tosend->hop_limit--; // bump down the hop count
printPacket("Rebroadcasting received floodmsg to neighbors", p);
// Note: we are careful to resend using the original senders node id
// We are careful not to call our hooked version of send() - because we don't want to check this again
Router::send(tosend);
} else {
DEBUG_MSG("Ignoring a simple (0 id) broadcast\n");
}
}
// handle the packet as normal
Router::sniffReceived(p, c);
}
(this old code is from commit bb396508a48917c1aaf1849c47218db223c0fbe6 but anything from about this date should be same Date: Mon Jul 26 04:53:13 2021 +0800)
But the refactored code seems to have moved this sniffing stage later in the processing, in a new class called RoutingModule:
bool RoutingModule::handleReceivedProtobuf(const MeshPacket &mp, Routing *r)
{
printPacket("Routing sniffing", &mp);
router->sniffReceived(&mp, r);
...
That function is called from this method:
/** Called to handle a particular incoming message
@return ProcessMessage::STOP if you've guaranteed you've handled this message and no other handlers should be considered for it
*/
virtual ProcessMessage handleReceived(const MeshPacket &mp) override
{
// FIXME - we currently update position data in the DB only if the message was a broadcast or destined to us
// it would be better to update even if the message was destined to others.
auto &p = mp.decoded;
DEBUG_MSG("Received %s from=0x%0x, id=0x%x, portnum=%d, payloadlen=%d\n", name, mp.from, mp.id, p.portnum,
p.payload.size);
T scratch;
T *decoded = NULL;
if (mp.which_payloadVariant == MeshPacket_decoded_tag && mp.decoded.portnum == ourPortNum) {
memset(&scratch, 0, sizeof(scratch));
if (pb_decode_from_bytes(p.payload.bytes, p.payload.size, fields, &scratch)) {
decoded = &scratch;
} else {
DEBUG_MSG("Error decoding protobuf module!\n");
// if we can't decode it, nobody can process it!
return ProcessMessage::STOP;
}
}
return handleReceivedProtobuf(mp, decoded) ? ProcessMessage::STOP : ProcessMessage::CONTINUE;
}
I bet this new check at "if we can't decode it, nobody can process it!" is causing STOP to be returned by one of the registered modules (not necessarily the new RoutingModule). Therefore now if decoding fails routing module never gets a chance to 'sniff' the packet and forward it to other nodes.
Notes:
does this help @andrekir, @mc-hamster and Garth (sorry Garth, I'm out of touch and don't yet know your github handle).
Hmm. I spent a dayish looking into this. I think I found the problem:
'sniffReceived' is a hook that routers can use to do early processing on received packets. In particular, the FloodingRouter mixin (which provides our mesh flood routing behavior) overrode this hook to do its retransmissions (to forward packets for other nodes).
The old code called sniffReceived here:
bool RoutingPlugin::handleReceivedProtobuf(const MeshPacket &mp, const Routing *r)
{
printPacket("Routing sniffing", &mp);
router->sniffReceived(&mp, r);
...
An important guarantee of the handleReceivedProtobuf 'contract' was that the decoded r parameter will be provided only if it was able to be decoded (i.e.e it could be null in some cases, and that point any of these hooks should treat it as opaque though they can still use the 'envelope' data in p).
And its implementation was:
void FloodingRouter::sniffReceived(const MeshPacket *p, const Routing *c)
{
// If a broadcast, possibly _also_ send copies out into the mesh.
// (FIXME, do something smarter than naive flooding here)
if (p->to == NODENUM_BROADCAST && p->hop_limit > 0 && getFrom(p) != getNodeNum()) {
if (p->id != 0) {
MeshPacket *tosend = packetPool.allocCopy(*p); // keep a copy because we will be sending it
tosend->hop_limit--; // bump down the hop count
printPacket("Rebroadcasting received floodmsg to neighbors", p);
// Note: we are careful to resend using the original senders node id
// We are careful not to call our hooked version of send() - because we don't want to check this again
Router::send(tosend);
} else {
DEBUG_MSG("Ignoring a simple (0 id) broadcast\n");
}
}
// handle the packet as normal
Router::sniffReceived(p, c);
}
(this old code is from commit bb396508a48917c1aaf1849c47218db223c0fbe6 but anything from about this date should be same Date: Mon Jul 26 04:53:13 2021 +0800)
But the refactored code seems to have moved this sniffing stage later in the processing, in a new class called RoutingModule:
bool RoutingModule::handleReceivedProtobuf(const MeshPacket &mp, Routing *r)
{
printPacket("Routing sniffing", &mp);
router->sniffReceived(&mp, r);
...
That function is called from this method:
/** Called to handle a particular incoming message
@return ProcessMessage::STOP if you've guaranteed you've handled this message and no other handlers should be considered for it
*/
virtual ProcessMessage handleReceived(const MeshPacket &mp) override
{
// FIXME - we currently update position data in the DB only if the message was a broadcast or destined to us
// it would be better to update even if the message was destined to others.
auto &p = mp.decoded;
DEBUG_MSG("Received %s from=0x%0x, id=0x%x, portnum=%d, payloadlen=%d\n", name, mp.from, mp.id, p.portnum,
p.payload.size);
T scratch;
T *decoded = NULL;
if (mp.which_payloadVariant == MeshPacket_decoded_tag && mp.decoded.portnum == ourPortNum) {
memset(&scratch, 0, sizeof(scratch));
if (pb_decode_from_bytes(p.payload.bytes, p.payload.size, fields, &scratch)) {
decoded = &scratch;
} else {
DEBUG_MSG("Error decoding protobuf module!\n");
// if we can't decode it, nobody can process it!
return ProcessMessage::STOP;
}
}
return handleReceivedProtobuf(mp, decoded) ? ProcessMessage::STOP : ProcessMessage::CONTINUE;
}
I bet this new check at "if we can't decode it, nobody can process it!" is causing STOP to be returned by one of the registered modules (not necessarily the new RoutingModule). Therefore now if decoding fails routing module never gets a chance to 'sniff' the packet and forward it to other nodes.
Notes:
does this help @andrekir, @mc-hamster and Garth (sorry Garth, I'm out of touch and don't yet know your github handle).
@geeksville that totally matches the timeline @thebentern and I were seeing it never seemed to be in the module implementation.
Untested (and only just considered it) but a quick fix for this might be to make sure that the new RoutingModule is first in the list of modules. So it would never be affected by this problem (if the debugger confirms that this is the issue).
I did some testing using the small fix from my PR (#1593) and it seemed to work already. Maybe someone can confirm this before diving further into this? I didn't save the logs, but I could see the node saying that it couldn't decode, but it did rebroadcast (and I got the 'acknowledged' sign in the phone app).
Oh! That would be great!
(Sent from a phone - please ignore typos)
On Sat, Aug 6, 2022, 11:11 GUVWAF @.***> wrote:
I did some testing using the small fix from my PR (#1593 https://github.com/meshtastic/Meshtastic-device/pull/1593) and it seemed to work already. Maybe someone can confirm this before diving further into this? I didn't save the logs, but I could see the node saying that it couldn't decode, but it did rebroadcast (and I got the 'acknowledged' sign in the phone app).
— Reply to this email directly, view it on GitHub https://github.com/meshtastic/Meshtastic-device/issues/1388#issuecomment-1207257881, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXB2IJNUVPD7BK3IK7QW3VX2TFLANCNFSM5TXPJSJQ . You are receiving this because you were mentioned.Message ID: @.***>
Should that be configurable (in case it gets fixed)? I can imagine some operators not wanting to rebroadcast packets which don´t belong to "their" network (which is somewhat implied by not having the PSK). Especially in denser populated areas where multiple non-related networks might exist...
Should that be configurable (in case it gets fixed)? I can imagine some operators not wanting to rebroadcast packets which don´t belong to "their" network (which is somewhat implied by not having the PSK). Especially in denser populated areas where multiple non-related networks might exist...
In my testing this is already working
The solution that does not require extra config values would be to use different channel settings and PSK, this only occurs with identical channel settings and different PSK's where you will be sharing the channel already.
Confirmed this is working in the latest firmware. I added some more detailed debug logs locally, so that the packet routing flow is more visible.
12:51:51 169 [RadioIf] handleReceiveInterrupt()
12:51:51 169 [RadioIf] Recieving packet with length 66
12:51:51 169 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=66, time 1148 ms
12:51:51 169 [RadioIf] Received mp->which_payloadVariant=0
12:51:51 169 [RadioIf] Lora RX (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:51 169 [RadioIf] AirTime - Packet received : 1148ms
12:51:51 170 [RadioIf] Received which_payloadVariant=5
12:51:51 170 [Router] Add packet record (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:51 170 [Router] No suitable channel found for decoding, hash was 0x73!
12:51:51 170 [Router] packet decoding failed (no PSK?) (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 170 [Router] Module 'Admin' doesn't want packet
12:51:52 170 [Router] Module 'nodeinfo' doesn't want packet
12:51:52 170 [Router] Module 'position' doesn't want packet
12:51:52 170 [Router] Module 'waypoint' doesn't want packet
12:51:52 170 [Router] Module 'text' doesn't want packet
12:51:52 170 [Router] Module 'remotehardware' doesn't want packet
12:51:52 170 [Router] Module 'reply' doesn't want packet
12:51:52 170 [Router] Module 'canned' doesn't want packet
12:51:52 170 [Router] Module 'DeviceTelemetry' doesn't want packet
12:51:52 170 [Router] Module 'EnvironmentTelemetry' doesn't want packet
12:51:52 170 [Router] Module 'ExternalNotificationModule' doesn't want packet
12:51:52 170 [Router] Module 'routing' wantsPacket=1
12:51:52 170 [Router] Received routing from=0x919dd8e8, id=0x41ec2ab9, portnum=50, payloadlen=42616
12:51:52 170 [Router] Routing sniffing (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 170 [Router] Rebroadcasting received floodmsg to neighbors (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 170 [Router] enqueuing for send (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim2 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 171 [Router] txGood=4,rxGood=2,rxBad=0
12:51:52 171 [Router] rx_snr found. hop_limit:2 rx_snr:5.750000
12:51:52 171 [Router] rx_snr found in packet. Setting tx delay:252
12:51:52 171 [Router] FIXME-update-db Sniffing packet
12:51:52 171 [Router] Delivering rx packet (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 171 [Router] Forwarding to phone (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:53 171 [Router] No suitable channel found for decoding, hash was 0x73!
12:51:53 171 [Router] Module 'routing' considered
12:51:53 171 [RadioIf] isActivelyReceiving()=0
12:51:53 171 [RadioIf] lora.scanChannel()=-15
12:51:53 171 [RadioIf] Starting low level send (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim2 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154 priority=64)
Having said this, setup could be daunting for users expecting this behavior to "just work" because they have two devices in the US set to LongFast for instance, with one on a the default LongFast configuration and another on a custom psk LongFast channel generated in the Android app.
Based on the decision made on device boot, we select either the user specified channel_num(-1) or create one based on a hash the channel name. The default channel_num is 91.
This is the operative code in RadioInterface.cpp:
// If user has manually specified a channel num, then use that, otherwise generate one by hashing the name const char *channelName = channels.getName(channels.getPrimaryIndex()); int channel_num = channelSettings.channel_num ? channelSettings.channel_num - 1 : hash(channelName) % numChannels;
Here are the logs from a startup on device with the defaults in the US region:
??:??:?? 2 Set radio: region=US, name=LongF, config=0, ch=91, power=30 ??:??:?? 2 Radio myRegion->freqStart / myRegion->freqEnd: 902.000000 -> 928.000000 (26.000000 mhz) ??:??:?? 2 Radio myRegion->numChannels: 104 ??:??:?? 2 Radio channel_num: 91 ??:??:?? 2 Radio frequency: 924.875000
Setting up a channel with the name "Test" for instance will end up with an operative channel_num of 45.
??:??:?? 2 Set radio: region=US, name=Test, config=0, ch=45, power=30
So now that we have our target channel number, we have to the python cli to set the matching one:
So we issue the command meshtastic --ch-set channel_num 45 --ch-index 0
. However that yields us with an operative channel number of 44 on the device based on the subtraction logic in the setup. This is confusing to a user who assumes that they have all matching parameters in the radio at this point.
It is already quite a process to have to comb through serial logs and issue a command in the python-cli only to be suprised on the next device bootup that the channel number does not match.
One caveat that is both nice but also misleading is that rebroadcasts are reported as ACKs in the Android and iOS app. I'm not sure if there is a clean way around this issue. One the one hand, it's nice to know that somebody heard your message, but a user could be mislead into thinking that the message was received by someone on their private channel, which may not be the case in a mixed channel scenario.
Category
Other
Hardware
Not Applicable
Firmware Version
1.3.6
Description
nodes should but are not rebroadcasting packets without a valid PSK to decode the payload.
Relevant log output