meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
3.3k stars 800 forks source link

[Bug]: nodes only rebroadcasting decoded packets #1388

Closed andrekir closed 2 years ago

andrekir commented 2 years ago

Category

Other

Hardware

Not Applicable

Firmware Version

1.3.6

Description

nodes should but are not rebroadcasting packets without a valid PSK to decode the payload.

Relevant log output

//\ E S H T /\ S T / C

??:??:?? 0 booted, wake cause 0 (boot count 1), reset_reason=reset
??:??:?? 0 Filesystem files:
??:??:?? 0   /prefs/channels.proto (53 Bytes)
??:??:?? 0   /prefs/db.proto (330 Bytes)
??:??:?? 0   /prefs/radio.proto (13 Bytes)
??:??:?? 0   /static/Files.svg.gz (3360 Bytes)
??:??:?? 0   /static/Files_Dark.svg.gz (3382 Bytes)
??:??:?? 0   /static/Logo_Black.svg.gz (592 Bytes)
??:??:?? 0   /static/Logo_White.svg.gz (600 Bytes)
??:??:?? 0   /static/View_Code.svg.gz (2361 Bytes)
??:??:?? 0   /static/View_Code_Dark.svg.gz (2359 Bytes)
??:??:?? 0   /static/icon.svg.gz (842 Bytes)
??:??:?? 0   /static/index.2c020c83.css.gz (10060 Bytes)
??:??:?? 0   /static/index.45be9dbe.js.gz (19162 Bytes)
??:??:?? 0   /static/index.html.gz (600 Bytes)
??:??:?? 0   /static/manifest.webmanifest.gz (230 Bytes)
??:??:?? 0   /static/robots.txt.gz (42 Bytes)
??:??:?? 0   /static/safari-tab.svg.gz (549 Bytes)
??:??:?? 0   /static/site.webmanifest.gz (196 Bytes)
??:??:?? 0   /static/sw.js.gz (770 Bytes)
??:??:?? 0   /static/vendor.4c541204.js.gz (217103 Bytes)
??:??:?? 0   /static/workbox-c663a9cf.js.gz (5199 Bytes)
??:??:?? 0 I2C device found at address 0x34
??:??:?? 0 axp192 PMU found
??:??:?? 0 done
??:??:?? 0 Meshtastic hwvendor=4, swver=1.3.6.f511bab
??:??:?? 0 Setting random seed 4278559648
??:??:?? 0 Total heap: 237676
??:??:?? 0 Free heap: 206168
??:??:?? 0 Total PSRAM: 4194252
??:??:?? 0 Free PSRAM: 4194252
??:??:?? 0 NVS: UsedEntries 99, FreeEntries 531, AllEntries 630, NameSpaces 3
??:??:?? 0 Setup Preferences in Flash Storage
??:??:?? 0 Number of Device Reboots: 25
??:??:?? 0 Initial packet id 1514734793, numPacketId 4294967295
??:??:?? 0 Setting default channel and radio preferences!
??:??:?? 0 Expanding short PSK #1
??:??:?? 0 Wanted region 0, using Unset
??:??:?? 0 Loading /prefs/db.proto
??:??:?? 0 Loaded saved preferences version 11
??:??:?? 0 Loading /prefs/radio.proto
??:??:?? 0 Loading /prefs/channels.proto
??:??:?? 0 Number of Device Reboots: 25
??:??:?? 0 Expanding short PSK #0
??:??:?? 0 Wanted region 6, using ANZ
??:??:?? 0 region=6, NODENUM=0xbdf0bebc, dbsize=2
chip id detect 0x3
Detect CHIP :AXP192
OUTPUT Register 0x5f
??:??:?? 0 AXP192 Begin PASS
??:??:?? 0 DCDC1: ENABLE
??:??:?? 0 DCDC2: ENABLE
??:??:?? 0 LDO2: ENABLE
??:??:?? 0 LDO3: ENABLE
??:??:?? 0 DCDC3: ENABLE
??:??:?? 0 Exten: ENABLE
??:??:?? 0 ----------------------------------------
??:??:?? 0 DCDC1: ENABLE
??:??:?? 0 DCDC2: ENABLE
??:??:?? 0 LDO2: ENABLE
??:??:?? 0 LDO3: ENABLE
??:??:?? 0 DCDC3: ENABLE
??:??:?? 0 Exten: ENABLE
SRC REG:0xc4
Charging enable is enable
Charging target-voltage : 0x2
 end when the charge current is lower than 10% of the set value
Charge current : 700.00 mA
??:??:?? 0 Battery: usbPower=1, isCharging=0, batMv=4156, batPct=100
??:??:?? 0 Read RTC time as 0 (cur millis 458) quality=0
??:??:?? 0 Using MSL altitude model
??:??:?? 0 WANT GPS=1
??:??:?? 0 Setting GPS power=1
??:??:?? 1 Connected to UBLOX GPS successfully
??:??:?? 3 Using UBLOX Mode
??:??:?? 3 External Notification Module Disabled
??:??:?? 3 Starting meshradio init...
??:??:?? 3 (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=16, time 559 ms
??:??:?? 3 Set radio: name=LongFast, config=2, ch=19, power=0
??:??:?? 3 Radio myRegion->freqStart / myRegion->freqEnd: 915.000000 -> 928.000000 (13.000000 mhz)
??:??:?? 3 Radio myRegion->numChannels: 52
??:??:?? 3 Radio channel_num: 19
??:??:?? 3 Radio frequency: 917.375000
??:??:?? 3 Short packet time: 559 msec
??:??:?? 3 Set radio: final power level=17
??:??:?? 3 RF95 init result -2
??:??:?? 3 Current limit set to 100.000000
??:??:?? 3 Current limit set result -16
??:??:?? 3 Warning: Failed to find RF95 radio
??:??:?? 3 Starting meshradio init...
??:??:?? 3 (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=16, time 559 ms
??:??:?? 3 Set radio: name=LongFast, config=2, ch=19, power=0
??:??:?? 3 Radio myRegion->freqStart / myRegion->freqEnd: 915.000000 -> 928.000000 (13.000000 mhz)
??:??:?? 3 Radio myRegion->numChannels: 52
??:??:?? 3 Radio channel_num: 19
??:??:?? 3 Radio frequency: 917.375000
??:??:?? 3 Short packet time: 559 msec
??:??:?? 3 Set radio: final power level=22
??:??:?? 3 SX126x init result 0
??:??:?? 3 Current limit set to 140.000000
??:??:?? 3 Current limit set result 0
??:??:?? 3 SX1262 Radio init succeeded, using SX1262 radio
??:??:?? 3 Not using WIFI
??:??:?? 3 (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=237, time 3188 ms
??:??:?? 3 myNodeInfo.bitrate = 74.341286 bytes / sec
??:??:?? 3 PowerFSM init, USB power=1
??:??:?? 3 Enter state: BOOT
[D][esp32-hal-cpu.c:189] setCpuFrequencyMhz(): PLL: 320 / 4 = 80 Mhz, APB: 80000000 Hz
??:??:?? 3 [Power] Battery: usbPower=1, isCharging=0, batMv=4156, batPct=100
??:??:?? 3 [GPS] publishing pos@0:2, hasVal=0, GPSlock=0
??:??:?? 3 [GPS] No GPS lock
??:??:?? 3 [GPS] onGPSChanged() pos@0, time=0, lat=0, lon=0, alt=0
??:??:?? 3 [GPS] updatePosition LOCAL pos@0, time=0, latI=0, lonI=0, alt=0
??:??:?? 3 [GPS] Node status update: 2 online, 2 total
??:??:?? 3 [DeviceTelemetryModule] Device Telemetry: Initializing
??:??:?? 3 [DeviceTelemetryModule] -----------------------------------------
??:??:?? 3 [DeviceTelemetryModule] Device Telemetry: Read data
??:??:?? 3 [DeviceTelemetryModule] Telemetry->time: 3
??:??:?? 3 [DeviceTelemetryModule] Telemetry->air_util_tx: 0.205667
??:??:?? 3 [DeviceTelemetryModule] Telemetry->battery_level: 100
??:??:?? 3 [DeviceTelemetryModule] Telemetry->channel_utilization: 5.473333
??:??:?? 3 [DeviceTelemetryModule] Telemetry->voltage: 4.156000
??:??:?? 3 [DeviceTelemetryModule] Device Telemetry: Sending packet to mesh
??:??:?? 3 [DeviceTelemetryModule] Update DB node 0xbdf0bebc, rx_time=0
??:??:?? 3 [DeviceTelemetryModule] handleReceived(LOCAL) (id=0x5a4904cc Fr0xbc To0xff, WantAck0, HopLim3 Ch0x0 Portnum=67)
??:??:?? 3 [DeviceTelemetryModule] No modules interested in portnum=67, src=LOCAL
??:??:?? 3 [DeviceTelemetryModule] Add packet record (id=0x5a4904cc Fr0xbc To0xff, WantAck0, HopLim3 Ch0x0 Portnum=67)
??:??:?? 3 [DeviceTelemetryModule] Should encrypt MQTT?: 1
??:??:?? 3 [DeviceTelemetryModule] Expanding short PSK #0
??:??:?? 3 [DeviceTelemetryModule] Installing AES0 key!
??:??:?? 3 [DeviceTelemetryModule] enqueuing for send (id=0x5a4904cc Fr0xbc To0xff, WantAck0, HopLim3 Ch0xa encrypted)
??:??:?? 3 [DeviceTelemetryModule] txGood=0,rxGood=0,rxBad=0
??:??:?? 3 [DeviceTelemetryModule] updateTelemetry LOCAL
??:??:?? 3 [DeviceTelemetryModule] Node status update: 2 online, 2 total
??:??:?? 3 [SerialModule] Serial Module Disabled
??:??:?? 3 [StoreForwardModule] Store & Forward Module - Disabled
??:??:?? 3 [RangeTestModule] Range Test Module - Disabled
??:??:?? 3 [RadioIf] Starting low level send (id=0x5a4904cc Fr0xbc To0xff, WantAck0, HopLim3 Ch0xa encrypted priority=64)
??:??:?? 3 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=44, time 886 ms
??:??:?? 3 [RadioIf] AirTime - Packet transmitted : 886ms
??:??:?? 4 [RadioIf] Completed sending (id=0x5a4904cc Fr0xbc To0xff, WantAck0, HopLim3 Ch0xa encrypted priority=64)
00:00:05 5 [GPS] Upgrading time to RTC 1650337983 secs (quality 3)
03:13:03 5 [GPS] Read RTC time as 1650337983 (cur millis 5513) quality=3
03:13:03 5 [GPS] publishing pos@0:2, hasVal=0, GPSlock=0
03:13:03 5 [GPS] onGPSChanged() pos@0, time=1650337983, lat=0, lon=0, alt=0
03:13:03 5 [GPS] updatePosition LOCAL pos@0, time=1650337983, latI=0, lonI=0, alt=0
03:13:03 5 [GPS] Node status update: 2 online, 2 total
03:13:03 6 [PowerFSM] Enter state: POWER
03:13:03 6 [PowerFSM] Initialise the ESP32 bluetooth module
03:13:03 6 [PowerFSM] Configuring the Device Information Service
03:13:03 6 [PowerFSM] Configuring the Mesh bluetooth service
I NimBLEDevice: "BLE Host Task Started"
I NimBLEDevice: "NimBle host synced."
D NimBLEDevice: ">> setPower: 7 (type: 11)"
D NimBLEDevice: "<< setPower"
D NimBLEDevice: "Setting bonding: 1, mitm: 1, sc: 1"
D (7201) NimBLE: Controller not ready to receive packets
03:13:04 7 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=38, time 821 ms
03:13:04 7 [RadioIf] Lora RX (id=0x5851f8b6 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6.5 rxSNR=1.73219)
03:13:04 7 [RadioIf] AirTime - Packet received : 821ms
03:13:04 7 [Router] Add packet record (id=0x5851f8b6 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6.5 rxSNR=1.73219)
03:13:04 7 [Router] No suitable channel found for decoding, hash was 0x8!
03:13:04 7 [Router] packet decoding failed (no PSK?) (id=0x5851f8b6 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxtime=1650337984 rxSNR=6.5 rxSNR=1.73219)
03:13:04 7 [Router] No modules interested in portnum=-174915562, src=REMOTE
03:13:12 14 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=21, time 616 ms
03:13:12 14 [RadioIf] Lora RX (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6.5 rxSNR=1.73219)
03:13:12 14 [RadioIf] AirTime - Packet received : 616ms
03:13:12 14 [Router] Add packet record (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6.5 rxSNR=1.73219)
03:13:12 14 [Router] No suitable channel found for decoding, hash was 0x8!
03:13:12 14 [Router] packet decoding failed (no PSK?) (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxtime=1650337992 rxSNR=6.5 rxSNR=1.73219)
03:13:12 14 [Router] No modules interested in portnum=834076677, src=REMOTE
03:13:16 18 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=21, time 616 ms
03:13:16 18 [RadioIf] Lora RX (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6 rxSNR=1.73219)
03:13:16 18 [RadioIf] AirTime - Packet received : 616ms
03:13:16 18 [Router] Found existing packet record for fr=0x5a4989b0,to=0xffffffff,id=0xa1c5c01a
03:13:16 18 [Router] Add packet record (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6 rxSNR=1.73219)
03:13:16 18 [Router] Ignoring incoming msg, because we've already seen it (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6 rxSNR=1.73219)
03:13:16 18 [Router] Incoming message was filtered 0x5a4989b0
03:13:20 23 [Power] Battery: usbPower=1, isCharging=0, batMv=4156, batPct=100
03:13:21 23 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=21, time 616 ms
03:13:21 23 [RadioIf] Lora RX (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=5.75 rxSNR=1.73219)
03:13:21 23 [RadioIf] AirTime - Packet received : 616ms
03:13:21 23 [Router] Found existing packet record for fr=0x5a4989b0,to=0xffffffff,id=0xa1c5c01a
03:13:21 24 [Router] Add packet record (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=5.75 rxSNR=1.73219)
03:13:21 24 [Router] Ignoring incoming msg, because we've already seen it (id=0xa1c5c01a Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=5.75 rxSNR=1.73219)
03:13:21 24 [Router] Incoming message was filtered 0x5a4989b0
andrekir commented 2 years ago

rebroadcasting only with full channel settings:

03:10:25 10 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=21, time 616 ms
03:10:25 10 [RadioIf] Lora RX (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6 rxSNR=1.73219)
03:10:25 10 [RadioIf] AirTime - Packet received : 616ms
03:10:26 10 [Router] Add packet record (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=6 rxSNR=1.73219)
03:10:26 10 [Router] Using channel 0 (hash 0x8)
03:10:26 10 [Router] Expanding short PSK #1
03:10:26 10 [Router] Installing AES128 key!
03:10:26 10 [Router] decoded message (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x0 Portnum=1 rxtime=1650337826 rxSNR=6 rxSNR=1.73219)
03:10:26 10 [Router] handleReceived(REMOTE) (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x0 Portnum=1 rxtime=1650337826 rxSNR=6 rxSNR=1.73219)
03:10:26 10 [Router] Module 'text' wantsPacket=1
03:10:26 10 [Router] Received text msg from=0x5a4989b0, id=0xa1c5c019, msg=2
03:10:26 10 [Router] Module 'text' considered
03:10:26 10 [Router] Module 'canned' wantsPacket=1
03:10:26 10 [Router] Module 'canned' considered
03:10:26 10 [Router] Module 'ExternalNotificationModule' wantsPacket=1
03:10:26 10 [Router] External Notification Module Disabled
03:10:26 10 [Router] Module 'ExternalNotificationModule' considered
03:10:26 10 [Router] Module 'StoreForwardModule' wantsPacket=1
03:10:26 10 [Router] Store & Forward Module - Disabled
03:10:26 10 [Router] Module 'StoreForwardModule' considered
03:10:26 10 [Router] Module 'routing' wantsPacket=1
03:10:26 10 [Router] Received routing from=0x5a4989b0, id=0xa1c5c019, portnum=1, payloadlen=1
03:10:26 10 [Router] Routing sniffing (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x0 Portnum=1 rxtime=1650337826 rxSNR=6 rxSNR=1.73219)
03:10:26 10 [Router] Rebroadcasting received floodmsg to neighbors (id=0xa1c5c019 Fr0xb0 To0xff, WantAck0, HopLim3 Ch0x0 Portnum=1 rxtime=1650337826 rxSNR=6 rxSNR=1.73219)
charminULTRA commented 2 years ago

This is occurring on 1.2.64 as well.

charminULTRA commented 2 years ago

I spent some time the other week trying to hunt this down (maybe too much lol), I'm not a programmer so this may not be very helpful or productive, but I suspect that when no suitable channel/PSK/module is found, the message is marked as being seen, and once it goes back into the process of checking whether it should be rebroadcast, it isn't because it's already been seen, but this doesn't make sense for this scenario when the packet should be forwarded despite being seen, as it may belong to a radio or radio group on a different PSK. I wasn't able to isolate the code that could be causing this, but I know it's a combination of Router, FloodingRouter, and ReliableRouter, perhaps an issue with the sniffing feature.

Again, I could be totally off, I'm not a programmer so apologies if this isn't helpful.

geeksville commented 2 years ago

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/do-you-plan-on-going-to-burning-man-2022/4170/27

geeksville commented 2 years ago

Hmm. I spent a dayish looking into this. I think I found the problem:

'sniffReceived' is a hook that routers can use to do early processing on received packets. In particular, the FloodingRouter mixin (which provides our mesh flood routing behavior) overrode this hook to do its retransmissions (to forward packets for other nodes).

The old code called sniffReceived here:

bool RoutingPlugin::handleReceivedProtobuf(const MeshPacket &mp, const Routing *r)
{
    printPacket("Routing sniffing", &mp);
    router->sniffReceived(&mp, r);
    ...

An important guarantee of the handleReceivedProtobuf 'contract' was that the decoded r parameter will be provided only if it was able to be decoded (i.e.e it could be null in some cases, and that point any of these hooks should treat it as opaque though they can still use the 'envelope' data in p).

And its implementation was:

void FloodingRouter::sniffReceived(const MeshPacket *p, const Routing *c)
{
    // If a broadcast, possibly _also_ send copies out into the mesh.
    // (FIXME, do something smarter than naive flooding here)
    if (p->to == NODENUM_BROADCAST && p->hop_limit > 0 && getFrom(p) != getNodeNum()) {
        if (p->id != 0) {
            MeshPacket *tosend = packetPool.allocCopy(*p); // keep a copy because we will be sending it

            tosend->hop_limit--; // bump down the hop count

            printPacket("Rebroadcasting received floodmsg to neighbors", p);
            // Note: we are careful to resend using the original senders node id
            // We are careful not to call our hooked version of send() - because we don't want to check this again
            Router::send(tosend);

        } else {
            DEBUG_MSG("Ignoring a simple (0 id) broadcast\n");
        }
    }

    // handle the packet as normal
    Router::sniffReceived(p, c);
}

(this old code is from commit bb396508a48917c1aaf1849c47218db223c0fbe6 but anything from about this date should be same Date: Mon Jul 26 04:53:13 2021 +0800)

But the refactored code seems to have moved this sniffing stage later in the processing, in a new class called RoutingModule:

bool RoutingModule::handleReceivedProtobuf(const MeshPacket &mp, Routing *r)
{
    printPacket("Routing sniffing", &mp);
    router->sniffReceived(&mp, r);
    ...

That function is called from this method:

    /** Called to handle a particular incoming message

    @return ProcessMessage::STOP if you've guaranteed you've handled this message and no other handlers should be considered for it
    */
    virtual ProcessMessage handleReceived(const MeshPacket &mp) override
    {
        // FIXME - we currently update position data in the DB only if the message was a broadcast or destined to us
        // it would be better to update even if the message was destined to others.

        auto &p = mp.decoded;
        DEBUG_MSG("Received %s from=0x%0x, id=0x%x, portnum=%d, payloadlen=%d\n", name, mp.from, mp.id, p.portnum,
                  p.payload.size);

        T scratch;
        T *decoded = NULL;
        if (mp.which_payloadVariant == MeshPacket_decoded_tag && mp.decoded.portnum == ourPortNum) {
            memset(&scratch, 0, sizeof(scratch));
            if (pb_decode_from_bytes(p.payload.bytes, p.payload.size, fields, &scratch)) {
                decoded = &scratch;
            } else {
                DEBUG_MSG("Error decoding protobuf module!\n");
                // if we can't decode it, nobody can process it!
                return ProcessMessage::STOP;
            }
        }

        return handleReceivedProtobuf(mp, decoded) ? ProcessMessage::STOP : ProcessMessage::CONTINUE;
    }

I bet this new check at "if we can't decode it, nobody can process it!" is causing STOP to be returned by one of the registered modules (not necessarily the new RoutingModule). Therefore now if decoding fails routing module never gets a chance to 'sniff' the packet and forward it to other nodes.

Notes:

does this help @andrekir, @mc-hamster and Garth (sorry Garth, I'm out of touch and don't yet know your github handle).

geeksville commented 2 years ago

Hmm. I spent a dayish looking into this. I think I found the problem:

'sniffReceived' is a hook that routers can use to do early processing on received packets. In particular, the FloodingRouter mixin (which provides our mesh flood routing behavior) overrode this hook to do its retransmissions (to forward packets for other nodes).

The old code called sniffReceived here:

bool RoutingPlugin::handleReceivedProtobuf(const MeshPacket &mp, const Routing *r)
{
    printPacket("Routing sniffing", &mp);
    router->sniffReceived(&mp, r);
    ...

An important guarantee of the handleReceivedProtobuf 'contract' was that the decoded r parameter will be provided only if it was able to be decoded (i.e.e it could be null in some cases, and that point any of these hooks should treat it as opaque though they can still use the 'envelope' data in p).

And its implementation was:

void FloodingRouter::sniffReceived(const MeshPacket *p, const Routing *c)
{
    // If a broadcast, possibly _also_ send copies out into the mesh.
    // (FIXME, do something smarter than naive flooding here)
    if (p->to == NODENUM_BROADCAST && p->hop_limit > 0 && getFrom(p) != getNodeNum()) {
        if (p->id != 0) {
            MeshPacket *tosend = packetPool.allocCopy(*p); // keep a copy because we will be sending it

            tosend->hop_limit--; // bump down the hop count

            printPacket("Rebroadcasting received floodmsg to neighbors", p);
            // Note: we are careful to resend using the original senders node id
            // We are careful not to call our hooked version of send() - because we don't want to check this again
            Router::send(tosend);

        } else {
            DEBUG_MSG("Ignoring a simple (0 id) broadcast\n");
        }
    }

    // handle the packet as normal
    Router::sniffReceived(p, c);
}

(this old code is from commit bb396508a48917c1aaf1849c47218db223c0fbe6 but anything from about this date should be same Date: Mon Jul 26 04:53:13 2021 +0800)

But the refactored code seems to have moved this sniffing stage later in the processing, in a new class called RoutingModule:

bool RoutingModule::handleReceivedProtobuf(const MeshPacket &mp, Routing *r)
{
    printPacket("Routing sniffing", &mp);
    router->sniffReceived(&mp, r);
    ...

That function is called from this method:

    /** Called to handle a particular incoming message

    @return ProcessMessage::STOP if you've guaranteed you've handled this message and no other handlers should be considered for it
    */
    virtual ProcessMessage handleReceived(const MeshPacket &mp) override
    {
        // FIXME - we currently update position data in the DB only if the message was a broadcast or destined to us
        // it would be better to update even if the message was destined to others.

        auto &p = mp.decoded;
        DEBUG_MSG("Received %s from=0x%0x, id=0x%x, portnum=%d, payloadlen=%d\n", name, mp.from, mp.id, p.portnum,
                  p.payload.size);

        T scratch;
        T *decoded = NULL;
        if (mp.which_payloadVariant == MeshPacket_decoded_tag && mp.decoded.portnum == ourPortNum) {
            memset(&scratch, 0, sizeof(scratch));
            if (pb_decode_from_bytes(p.payload.bytes, p.payload.size, fields, &scratch)) {
                decoded = &scratch;
            } else {
                DEBUG_MSG("Error decoding protobuf module!\n");
                // if we can't decode it, nobody can process it!
                return ProcessMessage::STOP;
            }
        }

        return handleReceivedProtobuf(mp, decoded) ? ProcessMessage::STOP : ProcessMessage::CONTINUE;
    }

I bet this new check at "if we can't decode it, nobody can process it!" is causing STOP to be returned by one of the registered modules (not necessarily the new RoutingModule). Therefore now if decoding fails routing module never gets a chance to 'sniff' the packet and forward it to other nodes.

Notes:

does this help @andrekir, @mc-hamster and Garth (sorry Garth, I'm out of touch and don't yet know your github handle).

garthvh commented 2 years ago

@geeksville that totally matches the timeline @thebentern and I were seeing it never seemed to be in the module implementation.

geeksville commented 2 years ago

Untested (and only just considered it) but a quick fix for this might be to make sure that the new RoutingModule is first in the list of modules. So it would never be affected by this problem (if the debugger confirms that this is the issue).

GUVWAF commented 2 years ago

I did some testing using the small fix from my PR (#1593) and it seemed to work already. Maybe someone can confirm this before diving further into this? I didn't save the logs, but I could see the node saying that it couldn't decode, but it did rebroadcast (and I got the 'acknowledged' sign in the phone app).

geeksville commented 2 years ago

Oh! That would be great!

(Sent from a phone - please ignore typos)

On Sat, Aug 6, 2022, 11:11 GUVWAF @.***> wrote:

I did some testing using the small fix from my PR (#1593 https://github.com/meshtastic/Meshtastic-device/pull/1593) and it seemed to work already. Maybe someone can confirm this before diving further into this? I didn't save the logs, but I could see the node saying that it couldn't decode, but it did rebroadcast (and I got the 'acknowledged' sign in the phone app).

— Reply to this email directly, view it on GitHub https://github.com/meshtastic/Meshtastic-device/issues/1388#issuecomment-1207257881, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXB2IJNUVPD7BK3IK7QW3VX2TFLANCNFSM5TXPJSJQ . You are receiving this because you were mentioned.Message ID: @.***>

sgofferj commented 2 years ago

Should that be configurable (in case it gets fixed)? I can imagine some operators not wanting to rebroadcast packets which don´t belong to "their" network (which is somewhat implied by not having the PSK). Especially in denser populated areas where multiple non-related networks might exist...

garthvh commented 2 years ago

Should that be configurable (in case it gets fixed)? I can imagine some operators not wanting to rebroadcast packets which don´t belong to "their" network (which is somewhat implied by not having the PSK). Especially in denser populated areas where multiple non-related networks might exist...

In my testing this is already working

The solution that does not require extra config values would be to use different channel settings and PSK, this only occurs with identical channel settings and different PSK's where you will be sharing the channel already.

thebentern commented 2 years ago

Confirmed this is working in the latest firmware. I added some more detailed debug logs locally, so that the packet routing flow is more visible.

12:51:51 169 [RadioIf] handleReceiveInterrupt()
12:51:51 169 [RadioIf] Recieving packet with length 66
12:51:51 169 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=66, time 1148 ms
12:51:51 169 [RadioIf] Received mp->which_payloadVariant=0
12:51:51 169 [RadioIf] Lora RX (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:51 169 [RadioIf] AirTime - Packet received : 1148ms
12:51:51 170 [RadioIf] Received which_payloadVariant=5
12:51:51 170 [Router] Add packet record (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:51 170 [Router] No suitable channel found for decoding, hash was 0x73!
12:51:51 170 [Router] packet decoding failed (no PSK?) (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 170 [Router] Module 'Admin' doesn't want packet
12:51:52 170 [Router] Module 'nodeinfo' doesn't want packet
12:51:52 170 [Router] Module 'position' doesn't want packet
12:51:52 170 [Router] Module 'waypoint' doesn't want packet
12:51:52 170 [Router] Module 'text' doesn't want packet
12:51:52 170 [Router] Module 'remotehardware' doesn't want packet
12:51:52 170 [Router] Module 'reply' doesn't want packet
12:51:52 170 [Router] Module 'canned' doesn't want packet
12:51:52 170 [Router] Module 'DeviceTelemetry' doesn't want packet
12:51:52 170 [Router] Module 'EnvironmentTelemetry' doesn't want packet
12:51:52 170 [Router] Module 'ExternalNotificationModule' doesn't want packet
12:51:52 170 [Router] Module 'routing' wantsPacket=1
12:51:52 170 [Router] Received routing from=0x919dd8e8, id=0x41ec2ab9, portnum=50, payloadlen=42616
12:51:52 170 [Router] Routing sniffing (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 170 [Router] Rebroadcasting received floodmsg to neighbors (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 170 [Router] enqueuing for send (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim2 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 171 [Router] txGood=4,rxGood=2,rxBad=0
12:51:52 171 [Router] rx_snr found. hop_limit:2 rx_snr:5.750000
12:51:52 171 [Router] rx_snr found in packet. Setting tx delay:252
12:51:52 171 [Router] FIXME-update-db Sniffing packet
12:51:52 171 [Router] Delivering rx packet (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:52 171 [Router] Forwarding to phone (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim3 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154)
12:51:53 171 [Router] No suitable channel found for decoding, hash was 0x73!
12:51:53 171 [Router] Module 'routing' considered
12:51:53 171 [RadioIf] isActivelyReceiving()=0
12:51:53 171 [RadioIf] lora.scanChannel()=-15
12:51:53 171 [RadioIf] Starting low level send (id=0x41ec2ab9 Fr0xe8 To0xff, WantAck0, HopLim2 Ch0x73 encrypted rxtime=1661604711 rxSNR=5.75 rxRSSI=1.52956e-154 priority=64)

Having said this, setup could be daunting for users expecting this behavior to "just work" because they have two devices in the US set to LongFast for instance, with one on a the default LongFast configuration and another on a custom psk LongFast channel generated in the Android app.

Background

Based on the decision made on device boot, we select either the user specified channel_num(-1) or create one based on a hash the channel name. The default channel_num is 91. This is the operative code in RadioInterface.cpp: // If user has manually specified a channel num, then use that, otherwise generate one by hashing the name const char *channelName = channels.getName(channels.getPrimaryIndex()); int channel_num = channelSettings.channel_num ? channelSettings.channel_num - 1 : hash(channelName) % numChannels; Here are the logs from a startup on device with the defaults in the US region: ??:??:?? 2 Set radio: region=US, name=LongF, config=0, ch=91, power=30 ??:??:?? 2 Radio myRegion->freqStart / myRegion->freqEnd: 902.000000 -> 928.000000 (26.000000 mhz) ??:??:?? 2 Radio myRegion->numChannels: 104 ??:??:?? 2 Radio channel_num: 91 ??:??:?? 2 Radio frequency: 924.875000 Setting up a channel with the name "Test" for instance will end up with an operative channel_num of 45. ??:??:?? 2 Set radio: region=US, name=Test, config=0, ch=45, power=30

Setting up a matching channel num

So now that we have our target channel number, we have to the python cli to set the matching one: So we issue the command meshtastic --ch-set channel_num 45 --ch-index 0. However that yields us with an operative channel number of 44 on the device based on the subtraction logic in the setup. This is confusing to a user who assumes that they have all matching parameters in the radio at this point. It is already quite a process to have to comb through serial logs and issue a command in the python-cli only to be suprised on the next device bootup that the channel number does not match.

Message rebroadcasts are reported as ACKs in the apps

One caveat that is both nice but also misleading is that rebroadcasts are reported as ACKs in the Android and iOS app. I'm not sure if there is a clean way around this issue. One the one hand, it's nice to know that somebody heard your message, but a user could be mislead into thinking that the message was received by someone on their private channel, which may not be the case in a mixed channel scenario.