matthijskooijman / arduino-lmic

:warning: This library is deprecated, see the README for alternatives.
705 stars 651 forks source link

AppSKey corruption? #123

Closed PhatHub closed 6 years ago

PhatHub commented 6 years ago

I must apologize, but this is a very odd situation. It's not very consistent and I can't pinpoint the cause.

The setup:

When my code is run, sometimes it will be encrypted incorrectly. I can comment and un-comment different lines, it was not obvious what would cause the key corruption. However, it would be consistent if I compiled the same code. (Below when I say "encrypt correctly" I mean that my python decryption script was able to recreate the plaintext)

For example: I first noticed that I had issues when my sketch was compiled to be over 20.4KB, and my payload would get corrupted. I could reduce the size by removing some random chunk of code and it would mysteriously un-corrupt itself, regardless of what I commented... as long as it was under 20.4KB

So another example... I'd decided to try compiling a new ttn-abp sketch from scratch, but it would corrupt my payload data. After playing around and adding a few println() calls in the do_send function, it mysteriously began to encrypt correctly. For comparison:

 // This encrypts incorrectly
void do_send(osjob_t* j){
    // Check if there is not a current TX/RX job running
    if (LMIC.opmode & OP_TXRXPEND) {
        Serial.println(F("OP_TXRXPEND, not sending"));
    } else {
        // Prepare upstream data transmission at the next possible time.
        unsigned char mess[] = {0x25,0x60,0x25,0x60,0x25,0x60,0x25,0x60};LMIC_setTxData2(1, mess, sizeof(mess), 0);
        Serial.println(F("Packet queued"));
    }
    // Next TX is scheduled after TX_COMPLETE event.
}
//Running this code, the python script reports random, incorrect plaintexts:
//  "OUTPUT: MEGA2560-1: [102, 238, 176, 2, 204, 255, 204, 83] decrypted!"
//  "OUTPUT: MEGA2560-1: [194, 43, 87, 44, 192, 127, 241, 180] decrypted!"
// etc etc
//  ...but for some reason this change encrypts it correctly!
void do_send(osjob_t* j) {
  // Check if there is not a current TX/RX job running
  if (LMIC.opmode & OP_TXRXPEND) {
    Serial.println(F("OP_TXRXPEND, not sending"));
  } else { //THIS IS WHERE YOU DO STUFF FOR TRANSMISSION AND ETC!!!!!
    // Prepare upstream data transmission at the next possible time.
    Serial.println(F("do_send about to send..."));
    Serial.println(F("do_send Sending Data..."));
    unsigned char mess[] = {0x25,0x60,0x25,0x60,0x25,0x60,0x25,0x60};LMIC_setTxData2(1, mess, sizeof(mess), 0); 
    Serial.println(F("Packet queued"));
  }//end of else
  // Next TX is scheduled after TX_COMPLETE event.
} 
// With this code, Python repeatedly shows: 
// "OUTPUT: MEGA2560-1: [37, 96, 37, 96, 37, 96, 37, 96] decrypted!" 
// which is the correct plaintext!

I was just wondering... has anyone else encountered this before? Anyone have a clue if this happens a lot with Mega2560? Is there something funky in its memory map/layout that would cause the AppSKey to be corrupted, or maybe cause the encryption library to work incorrectly? Or maybe this is a hardware problem? (So far the 2560 has been working fine as far as I've tested.)

I know this is a bit much to debug without having the physical device, so I'm more curious about whether or not others have encountered something like this (wrong/corrupted cyphertext) with their Arduino-based motes? And if anyone has a suggestion for diagnosing this, I'd be really happy! :) Thanks, -PhatHub

edit: corrected the F("do_send Sending Data...") macro termination due to copypasta error.

avbentem commented 6 years ago

I guess you know that the F("...") macro refers to using PROGMEM? As it clearly doesn't make sense that using more of PROGMEM would make things not fail, maybe accessing it somehow changes things...? (For one, the keys are in PROGMEM too, when using the example sketches.)

Assuming you're already performing the test cases with the exact same pieces of hardware, the same LoRaWAN keys and are always starting the frame counter at zero, I'd still try to make the test cases more similar. Like adding a single Serial.println(F("...")) to both test cases, and only play with the length of its string in the F("...") macro, to see if you still can get one that repeatedly fails and one that repeatedly works?

PhatHub commented 6 years ago

@avbentem Yeah, it didn't make sense to me. The first example (program-size-based error) where it worked after removing code made sense to me. Maybe the boot loader is corrupted, and it's programming the 2560 incorrectly. I'm just puzzled why it interferes with the AppSKey. Which, btw, I now have it displaying the AppSKey through Serial, and it's actually changing the key after transmitting each packet. One of the bytes was identified as my uplink frame counter! Definitely something wrong with the RobotDyn module! :-(

I'm testing my sketch with an M0 Feather, and it's working correctly so far!

avbentem commented 6 years ago

Also note that the following line is missing its closing ));:

Serial.println(F("do_send Sending Data..."

If this is really how you used it, then maybe due to the fact that F is a macro along with possibly some excessive closing )); further down in your code, this still translates into compilable code somehow...?

PhatHub commented 6 years ago

Ah, I'm pretty sure that was a copy-and-paste error. (Just tested with that typo, and I got a compile error for unterminated macro.)

Before pasting that, I'd probably:

  1. Had it tested at a "working state",
  2. Then deleted the line to get back to (and tested) the "broken state" and copied that here.
  3. Then hit ctrl-z a bunch to restore the "do_send Sending Data..." line to copy to the function here. :-/

So I probably didn't "undo" enough times to re-paste with the closing parenthesis/semicolon. (I'll correct that post) Also, I've had interesting results of correct and incorrect keys while playing around with println calls without using the F() macro.

I'm going to chalk this up to a faulty RobotDyn module. Although how it's faulty is unknown. Maybe the bootloader for the programmer is bad, and stores the firmware into ROM incorrectly. Maybe writes to ROM with a bad offset (or jumps around) or it corrupts the sketch or something.

This form factor is perfect for a LoRa module, but I guess my trials (and apparently two Amazon reviews) has shown that users should probably avoid the RobotDyn 2560 Mini series until they work out the issues it has.

PhatHub commented 6 years ago

NOTE: Just in case anyone plays with the original Arduino Mega2560... I had a brain lapse working with a shield and the (original) Arduino 2560... remember to bridge the Uno's (avr328) traditional SPI pins to the 2560's SPI pins! I was chasing down "firmware version: 0" errors because of this simple error. The RobotShop shield's pinout configuration is:

.nss = 10,
.rxtx = LMIC_UNUSED_PIN,
.rst = 9,
.dio = {2, 6, LMIC_UNUSED_PIN}