mcci-catena / arduino-lmic

LoraWAN-MAC-in-C library, adapted to run under the Arduino environment
https://forum.mcci.io/c/device-software/arduino-lmic/
MIT License
648 stars 212 forks source link

Invalid MIC reported by Chirpstack, timing issue too? #564

Closed dolfandringa closed 4 years ago

dolfandringa commented 4 years ago

Describe your question or issue Potentially related to issue #563, I am getting errors with chirpstack about an invalid MIC calculation from my device using MCCI Arduino LMIC.

time="2020-04-22T10:55:00+08:00" level=error msg="uplink: processing uplink frame error" ctx_id=f65cc3f9-9a89-40b5-839c-46410bfc4144 error="get device-session error: invalid MIC"

I am using MCCI Arduino LMIC with abp on a Teensy++ 2.0. The gateway side is the Congduc Pham's Low Cost LoRa Gateway as packet forwarder and a local chirpstack installation. I have turned off frame count validation and am using symmetric Session and Application keys so the byte order isn't an issue. I don't know if the gateway can mess anything up in the packet that could cause this, or if its only Arduino LMIC that can screw things up. I certainly don't trust the gateway software I am using too much.

I am compiling with the following flags:

-DLMIC_USE_INTERRUPTS -DCFG_sx1276_radio -DCFG_us915 -DARDUINO_LMIC_PROJECT_CONFIG_H_SUPPRESS -DDISABLE_INVERT_IQ_ON_RX -DDISABLE_PING -DDISABLE_BEACONS -DLMIC_PRINTF_TO=Serial1 

Before I added a delay() in my arduino loop() function, but with 2 debug print statements (see issue above) consistently Chirpstack was reporting the invalid MIC error on all frames except frames 0 and 4. Those 2 frames were accepted correctly. Every time I restarted the device, it was exactly the same. Now I have added delays, and it looks like a few more come through correctly, but less reliably. With delay(1) or delay(5) at the end of the loop() function, frames that often come through correctly 0, 1, 6, 8, 16 and 17, but usually at least 1 or 2 of those fail as well (usually 6 and/or 8).

Environment

altishchenko commented 4 years ago

@dolfandringa Just to verify, is local decryption disabled on your gateway?

dolfandringa commented 4 years ago

yes

dolfandringa commented 4 years ago

I enabled it once, and then the gateway complained about an invalid MIC as well. But I don't fully trust the gateway too much so didn't dive in there.

dolfandringa commented 4 years ago

I did more debugging by just printing the raw lorawan messages on the gateway (before any postprocessing). So that should really be the message as created by LMIC.

This is a packet as sent by the device:

             mhdr|fhdr                   |fport|payload                               |mic
Full packet: 40   7C 19 04 26 80    00 00 01    6C 11 AD 13 6B F5 E5 BA C1 A1 7E 4B 5E C4 98 5E 70
             mhdr|devaddr    |fctrl|seq  |fport|payload                               |mic

I dove into the LoraWAN standard and added what I think (correct me if I'm wrong) each section of the message is.

Then I logged a bunch of messages from the same device, that always sends "Hello, world!" as payload. And wrote the python snippet below to validate the MIC's:

from Crypto.Hash import CMAC
from Crypto.Cipher import AES

key = [0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0xbb]

def getMic(frame):
    mic = frame[-4:] # last 4 bytes are the MIC
    msg = frame[:-4] # take off the MIC
    mhdr = msg[0]
    fhdr = msg[1:8]
    fport = msg[8]
    payload = msg[9:]
    address = fhdr[:4]
    fctrl = fhdr[4]
    seq = fhdr[-2:]
    dndir = 0
    length = len(msg)
    b0 = [0x49, 0x00, 0x00, 0x00, 0x00, dndir] + address + seq + [0x00, 0x00] + [0x00, length] # In b0 FCntUp is 4 bytes instead of 2, so pad with [0x00, 0x00]
    cobj = CMAC.new(bytearray(key), ciphermod=AES)
    cobj.update(bytearray(b0) + bytearray(msg))
    return " ".join([f"{i:02X}" for i in cobj.digest()[:4]])

frames = [                           # as logged on the gateway
"40 7C 19 04 26 80 00 00 01 6C 11 AD 13 6B F5 E5 BA C1 A1 7E 4B 5E C4 98 5E 70",
"40 7C 19 04 26 80 01 00 01 0A AF 98 E3 22 85 9E 40 AC 6A A5 C6 CB 46 44 AE 33",
"40 7C 19 04 26 80 02 00 01 E5 9D 40 C9 68 3A FE 59 64 B6 03 EB 8E 7C BA 37 54",
"40 7C 19 04 26 80 03 00 01 55 77 4C 2F 79 D4 80 C8 8E 58 D4 8D 1D E0 E9 E4 E1",
"40 7C 19 04 26 80 04 00 01 FA 38 4F FD 72 DE DF 68 B3 ED B9 38 72 20 72 E5 F2",
"40 7C 19 04 26 80 05 00 01 C7 8A A9 E7 70 38 48 7B CD B2 8D E6 9D C8 57 A3 B7",
"40 7C 19 04 26 80 06 00 01 79 5F 5E E4 C6 66 51 DB A2 4B CB 64 98 8D 93 AF 2A",
"40 7C 19 04 26 80 07 00 01 FC 09 90 7D 37 B7 8E 70 9C 58 3B 4D 62 50 61 F4 E7",
"40 7C 19 04 26 80 08 00 01 C2 DE BE E8 89 73 2B 2E DE 8D 95 09 FB C6 D1 10 BC"
]

for i, frame in enumerate(frames):
    mic = getMic([int(i, 16) for i in frame.split(" ")])
    correct = (mic == frame[-11:])
    if not correct:
        print(f"Frame {i} MIC wrong. calculated: {mic}, provided: {frame[-11:]}")

With that list of frames, all MIC's except for frame 0, 1 and 6 are wrong somehow.

Then I did it again, but let it run longer. These were the resulting frames (just power cycled the device, no other changes):

frames = [
"40 7C 19 04 26 80 00 00 01 6C 11 AD 13 6B F5 E5 BA C1 A1 7E 4B 5E C4 98 5E 70",
"40 7C 19 04 26 80 01 00 01 0A AF 98 E3 22 85 9E 40 AC 6A A5 C6 CB 46 44 AE 33",
"40 7C 19 04 26 80 02 00 01 E5 9D 40 C9 68 3A FE 59 64 B6 03 EB 8E 7C BA 37 54",
"40 7C 19 04 26 80 03 00 01 55 77 4C 2F 79 D4 80 C8 8E 58 D4 8D 1D E0 E9 E4 E1",
"40 7C 19 04 26 80 04 00 01 FA 38 4F FD 72 DE DF 68 B3 ED B9 38 72 20 72 E5 F2",
"40 7C 19 04 26 80 05 00 01 C7 8A A9 E7 70 38 48 7B CD B2 8D E6 9D C8 57 A3 B7",
"40 7C 19 04 26 80 06 00 01 79 5F 5E E4 C6 66 51 DB A2 4B CB 64 98 5C F9 3E 83",
"40 7C 19 04 26 80 07 00 01 FC 09 90 7D 37 B7 8E 70 9C 58 3B 4D 62 50 61 F4 E7",
"40 7C 19 04 26 80 08 00 01 F5 7C BF E7 F6 05 CA 6D 6F 26 A0 8F C5 07 FD 6F 40",
"40 7C 19 04 26 80 09 00 01 83 93 71 6F 7F 6F 3A E3 2D FA F1 52 7D 75 63 3D AA",
"40 7C 19 04 26 80 0A 00 01 E2 77 12 A8 29 F0 89 BA 9B BB 5B 76 F3 4D D6 19 73",
"40 7C 19 04 26 80 0B 00 01 55 7E DD CB 60 1B 4E F9 58 E7 FB 02 3D FB 66 0C 97",
"40 7C 19 04 26 80 0C 00 01 36 9B A0 14 76 EB 3F AD 60 FD A5 F5 3A 35 41 39 D2",
"40 7C 19 04 26 80 0D 00 01 27 76 DE 5D 7A CA 31 EE 79 A1 5B 27 C1 64 20 4C D3",
"40 7C 19 04 26 80 0E 00 01 B0 94 93 01 3F 15 D7 24 67 AB 6D 2E F3 CB F9 60 A1",
"40 7C 19 04 26 80 0F 00 01 F2 F3 90 3D 72 F4 7F D3 AD EB 09 8E F2 D5 87 4E 6A",
"40 7C 19 04 26 80 10 00 01 9E 13 14 B4 D3 B2 28 8F A8 50 13 92 10 6C 89 42 52",
"40 7C 19 04 26 80 11 00 01 16 91 F0 F1 5F 87 17 08 67 85 F3 67 F4 43 01 B1 DA",
"40 7C 19 04 26 80 12 00 01 C0 DB B2 E9 22 5D 49 CE 03 F1 B8 0D 28 3E 8F 5C 51",
"40 7C 19 04 26 80 13 00 01 86 8A 11 EC AB 95 91 EA 99 71 2B 25 CE 8C B0 50 AF",
"40 7C 19 04 26 80 14 00 01 91 70 57 6F 1C F2 F6 C0 B6 C8 AA 4A F2 3B 60 8E 61",
"40 7C 19 04 26 80 15 00 01 A5 EB E7 BC 88 AF B2 1D D6 82 0D 1D A4 A4 9A F4 83",
"40 7C 19 04 26 80 16 00 01 5F 05 BF CA 1B 27 46 DD 70 67 11 72 34 4D F2 87 0D",
"40 7C 19 04 26 80 17 00 01 85 8E 22 19 6E A4 AE 81 B1 C1 62 14 5D CF BA 78 D2",
"40 7C 19 04 26 80 18 00 01 EC FD 7E 7B 11 D3 BB AF FB AA 58 06 8E 33 C3 2D 7A",
"40 7C 19 04 26 80 19 00 01 3B A6 34 F3 C9 83 63 D3 8E 2A 87 2C 89 B2 3E D5 34",
"40 7C 19 04 26 80 1A 00 01 8B BD 89 4C 3E E5 7B 33 E9 F6 82 7D E5 8E 39 9E 3A",
"40 7C 19 04 26 80 1B 00 01 F1 E2 A5 DC 54 DF D1 B1 56 5F E3 79 87 D8 EB 64 D5",
"40 7C 19 04 26 80 1C 00 01 37 17 E2 08 36 EA 40 3B 75 25 FD E2 A1 57 8A F8 FF",
"40 7C 19 04 26 80 1D 00 01 B7 9C C2 A4 AD 44 73 8F 1B 1A BC 94 21 6A 88 E6 0C",
"40 7C 19 04 26 80 1E 00 01 90 F7 B0 8C F3 61 4E 93 4B 21 11 37 F1 DE 85 57 95",
"40 7C 19 04 26 80 1F 00 01 F9 D1 C6 49 9F AC 9D BF D2 9C 22 B9 A6 33 BF F4 7E",
"40 7C 19 04 26 80 20 00 01 B7 27 6B F6 73 03 B3 F4 B6 EF 1C 90 F5 38 E0 FC AD",
"40 7C 19 04 26 80 21 00 01 56 74 44 8D 81 29 ED 0B B8 97 E9 3E CF B0 78 B7 DA",
"40 7C 19 04 26 80 22 00 01 7E 17 6E AB 8A F8 14 FA 5E 7E D8 AC 5C CC C7 09 7E",
"40 7C 19 04 26 80 23 00 01 ED 75 B7 C3 2E 25 AD 24 FD 06 01 BF D7 A2 FC 7B 99",
"40 7C 19 04 26 80 24 00 01 37 FF DE 17 EE 77 5C E9 19 19 72 0B EE 07 5F 32 DB",
"40 7C 19 04 26 80 25 00 01 51 D7 62 C9 D1 9E F8 7E 04 23 2F 5D 0B D0 65 24 85",
"40 7C 19 04 26 80 26 00 01 C8 43 FC AF 10 B2 EC 2B D1 38 AD 1C 9B AC F8 48 F0",
"40 7C 19 04 26 80 27 00 01 C7 7A 67 F7 F8 DC 2F EF E2 1F 0B 8F D9 3E 35 65 1F",
"40 7C 19 04 26 80 28 00 01 27 71 07 A8 AC DC 12 65 22 0F B3 16 D0 4F D6 AC 1B",
"40 7C 19 04 26 80 00 00 01 6C 11 AD 13 6B F5 E5 BA C1 A1 7E 4B 5E C4 98 5E 70",
"40 7C 19 04 26 80 01 00 01 0A AF 98 E3 22 85 9E 40 AC 6A A5 C6 CB 46 44 AE 33"
]

Note that the first 6 frames and the 8th frame are the same, but 9 and 6 differ (why?). With that second set, all MICs except for frame 0, 1, 16, 17, 24, 26, 37, 41 and 42 are wrong.

What could be the cause of these differences? I do suspect it is something with the combination of the microcontroller and LMIC library?

dolfandringa commented 4 years ago

This is my arduino sketch:

#include <lmic.h>
#include <hal/hal.h>
#include <SPI.h>

#ifndef SerialPort
#define SerialPort Serial
#endif

// For normal use, we require that you edit the sketch to replace FILLMEIN
// with values assigned by the TTN console. However, for regression tests,
// we want to be able to compile these scripts. The regression tests define
// COMPILE_REGRESSION_TEST, and in that case we define FILLMEIN to a non-
// working but innocuous value.
//
#ifdef COMPILE_REGRESSION_TEST
# define FILLMEIN 0
#else
# warning "You must replace the values marked FILLMEIN with real values from the TTN control panel!"
# define FILLMEIN (#dont edit this, edit the lines that use FILLMEIN)
#endif

// LoRaWAN NwkSKey, network session key
static const PROGMEM u1_t NWKSKEY[16] = {0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB};

// LoRaWAN AppSKey, application session key
static const u1_t PROGMEM APPSKEY[16] = {0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB ,0xBB};

static const u4_t DEVADDR = 0x2604197C ; // <-- Change this address for every node!

void os_getArtEui (u1_t* buf) { }
void os_getDevEui (u1_t* buf) { }
void os_getDevKey (u1_t* buf) { }

static uint8_t mydata[] = "Hello, world!";
static osjob_t sendjob;

// Schedule TX every this many seconds (might become longer due to duty
// cycle limitations).
const unsigned TX_INTERVAL = 40;

// Pin mapping
// Adapted for Feather M0 per p.10 of [feather]
const lmic_pinmap lmic_pins = {
    .nss = SX1272_SS,                       // chip select on feather (rf95module) CS
    .rxtx = LMIC_UNUSED_PIN,
    .rst = SX1272_RST,                       // reset pin
    .dio = {SX1272_DIO0, SX1272_DIO1, LMIC_UNUSED_PIN}, // assumes external jumpers [feather_lora_jumper]
                                    // DIO1 is on JP1-1: is io1 - we connect to GPO6
                                    // DIO1 is on JP5-3: is D2 - we connect to GPO5
};

void do_send(osjob_t* j){
    // Check if there is not a current TX/RX job running
    if (LMIC.opmode & OP_TXRXPEND) {
        SerialPort.println(F("OP_TXRXPEND, not sending"));
    } else {
        // Prepare upstream data transmission at the next possible time.
        LMIC_setTxData2(1, mydata, sizeof(mydata)-1, 0);
        SerialPort.println(F("Packet queued"));
    }
    // Next TX is scheduled after TX_COMPLETE event.
}

void onEvent (ev_t ev) {
    SerialPort.print(os_getTime());
    SerialPort.print(F(": "));
    switch(ev) {
        case EV_SCAN_TIMEOUT:
            SerialPort.println(F("EV_SCAN_TIMEOUT"));
            break;
        case EV_BEACON_FOUND:
            SerialPort.println(F("EV_BEACON_FOUND"));
            break;
        case EV_BEACON_MISSED:
            SerialPort.println(F("EV_BEACON_MISSED"));
            break;
        case EV_BEACON_TRACKED:
            SerialPort.println(F("EV_BEACON_TRACKED"));
            break;
        case EV_JOINING:
            SerialPort.println(F("EV_JOINING"));
            break;
        case EV_JOINED:
            SerialPort.println(F("EV_JOINED"));
            break;
        /*
        || This event is defined but not used in the code. No
        || point in wasting codespace on it.
        ||
        || case EV_RFU1:
        ||     SerialPort.println(F("EV_RFU1"));
        ||     break;
        */
        case EV_JOIN_FAILED:
            SerialPort.println(F("EV_JOIN_FAILED"));
            break;
        case EV_REJOIN_FAILED:
            SerialPort.println(F("EV_REJOIN_FAILED"));
            break;
        case EV_TXCOMPLETE:
            SerialPort.println(F("EV_TXCOMPLETE (includes waiting for RX windows)"));
            if (LMIC.txrxFlags & TXRX_ACK)
                SerialPort.println(F("Received ack"));

            SerialPort.print("txrxFlags: 0x");
            SerialPort.println(LMIC.txrxFlags, HEX);
            if (LMIC.dataLen) {
                SerialPort.print(F("Received "));
                SerialPort.print(LMIC.dataLen);
                SerialPort.println(F(" bytes of payload"));
                /*for (int i = 0; i < LMIC.dataLen; i++) {
                    if (LMIC.frame[LMIC.dataBeg + i] < 0x10) {
                        SerialPort.print(F("0"));
                    }
                    SerialPort.print(LMIC.frame[LMIC.dataBeg + i], HEX);
                }
                SerialPort.println();*/
            }
            // Schedule next transmission
            os_setTimedCallback(&sendjob, os_getTime()+sec2osticks(TX_INTERVAL), do_send);
            break;
        case EV_LOST_TSYNC:
            SerialPort.println(F("EV_LOST_TSYNC"));
            break;
        case EV_RESET:
            SerialPort.println(F("EV_RESET"));
            break;
        case EV_RXCOMPLETE:
            // data received in ping slot
            SerialPort.println(F("EV_RXCOMPLETE"));
            break;
        case EV_LINK_DEAD:
            SerialPort.println(F("EV_LINK_DEAD"));
            break;
        case EV_LINK_ALIVE:
            SerialPort.println(F("EV_LINK_ALIVE"));
            break;
        /*
        || This event is defined but not used in the code. No
        || point in wasting codespace on it.
        ||
        || case EV_SCAN_FOUND:
        ||    SerialPort.println(F("EV_SCAN_FOUND"));
        ||    break;
        */
        case EV_TXSTART:
            SerialPort.println(F("EV_TXSTART"));
            break;
        default:
            SerialPort.print(F("Unknown event: "));
            SerialPort.println((unsigned) ev);
            break;
    }
}

void setup() {
    pinMode(SX1272_RST, OUTPUT); 
    pinMode(SX1272_SS, OUTPUT); 
    pinMode(SX1272_DIO0, INPUT); 
    pinMode(SX1272_DIO1, INPUT); 
    pinMode(LED_BUILTIN, OUTPUT);
    SerialPort.begin(115200);
    delay(100);     // per sample code on RF_95 test
    SerialPort.println(F("Starting"));

    // LMIC init
    os_init();
    // Reset the MAC state. Session and pending data transfers will be discarded.
    LMIC_reset();

    uint8_t appskey[sizeof(APPSKEY)];
    uint8_t nwkskey[sizeof(NWKSKEY)];
    memcpy_P(appskey, APPSKEY, sizeof(APPSKEY));
    memcpy_P(nwkskey, NWKSKEY, sizeof(NWKSKEY));
    LMIC_setSession (0x13, DEVADDR, nwkskey, appskey);
    LMIC_setClockError(MAX_CLOCK_ERROR * 2 / 100);
    //LMIC_setClockError(MAX_CLOCK_ERROR);

    for (int c = 0; c < 72; c++){
      LMIC_disableChannel(c);
    }
    LMIC_enableChannel(14);
    //LMIC_enableChannel(6);

    // Disable link check validation
    LMIC_setLinkCheckMode(0);

    // TTN uses SF9 for its RX2 window.
    LMIC.dn2Dr = DR_SF9;
    LMIC_setDrTxpow(DR_SF7, 14);

    // Start job
    do_send(&sendjob);
}

void loop() {
    unsigned long now;
    now = millis();

    if ((now % 512) == 0) {
        digitalWrite(LED_BUILTIN, !digitalRead(LED_BUILTIN));
        //SerialPort.println(F("Blink"));
    }
    os_runloop_once();
    delay(5);

}
altishchenko commented 4 years ago

Sorry for the delay. Symptoms are very strange. What results will you get if you define USE_ORIGINAL_AES at compile?

terrillmoore commented 4 years ago

If there's a microprocessor problem, it's probably RAM -- check that your stack is not crawling down into the data buffer. (Sorry I can't spend more time helping, $DAYJOB is keeping me too busy.)

dolfandringa commented 4 years ago

It is really weird. Using either USE_IDETRON_AES or USE_ORIGINAL AES doesn't fix it. It does change the message output, but still MICs are wrong. They seem to be quite consistently wrong as in, when I restart the same device a few times the resulting messages are the same, although there is some variation as shown before. It kind of feels like there is a variable initialized wrong somewhere? Or old data getting reused on the teensy or something?

dolfandringa commented 4 years ago

If there's a microprocessor problem, it's probably RAM -- check that your stack is not crawling down into the data buffer. (Sorry I can't spend more time helping, $DAYJOB is keeping me too busy.)

Yeah, I am starting to think along the same lines. Although platformio is saying I only use 19% or RAM and 19% of flash memory. How would I go about checking if my data buffers don't get overwritten with other stuff in RAM?

terrillmoore commented 4 years ago

Fill the message buffer with 0xFF before doing a send -- it's pretty large, much larger than the messages you're sending. You can use sizeof(LMIC.msg) after the transmit is complete. (Also a relevant difference; I don't use PlatformIO, I use the Arduino IDE.)

BTW: I strongly recommend against the unusual channel-enable work your code is doing. Please enable groups of 8 channels when using a US-like bandplan. This cannot be avoided. The LMIC does not support other use patterns; if there's a bug it likely won't get fixed. (If you have a single-channel gateway, that's not LoRaWAN compatible.) In any case, please switch to a standard bandplan while debugging your other problems.

dolfandringa commented 4 years ago

OK, thanks for the help. For the LMIC.msg: I don't see LMIC.msg anywhere in the code. I tried to grep for it, but it doesn't seem to be defined? I'll answer the other 2 remarks separately.

dolfandringa commented 4 years ago

PlatformIO uses the Arduino tools (in my case, the teensyduino tools, which are an addition on top of the arduino IDE) for compilation/uploading. So the compiler and upload method are the same. So that shouldn't cause any issues.

About the gateway: yes I am currently using a single channel gateway. I know it's not lorawan compliant, but RFM96W's is all I have currently, and online shopping is currently shutdown here, so it's all I can use now. I will switch to a proper concentrator and different gateway software later though, but for now this is all I can test with. And I won't go into the Semtech cashgrab of excluding single channel gateways from the specs while at the same time charging extraordinary amounts for the chips needed for a multi-channel gateway.

terrillmoore commented 4 years ago

Working from memory, sorry. LMIC.frame[].

dolfandringa commented 4 years ago

It is interesting. Setting all of LMIC.frame[] to 0xFF didn't help. So I just swapped out my teensy for a maple mini (STM32F103 board), and there is no issue at all with exactly the same code. So it seems there is something going on in the teensy memory in combination with the LMIC library that causes this behaviour. I am giving up on the teensy++ 2.0 for now.