mcci-catena / arduino-lmic

LoraWAN-MAC-in-C library, adapted to run under the Arduino environment
https://forum.mcci.io/c/device-software/arduino-lmic/
MIT License
641 stars 208 forks source link

How to test/use class B #479

Open sualko opened 4 years ago

sualko commented 4 years ago

Hi everyone,

I know support for class B is labeled as untested, but I would like to change that :wink:. Can anyone give me some advice how I can debug support of class B, because I'm currently stuck. Also helpful resources to read are appreciated. If you have no time to answer this noob question, feel free to close the issue.

My stack: Loraserver + Loragateway, Raspberry Pi 3 as Gateway with RAK831 and GPS converter board. I run Raspbian on the gateway with lora-gateway and packet-forwarder from lora.net. Everything configured to use EU868. Class A ist working fine and the packet-forwarder is showing the right GPS coordinates and the emission of a beacon. In the loraserver my device profile enables class B and sets ping frequency to 4 seconds (all other settings are untouched at 0).

From the lorawan specification I know that every device joins as class A device and levels up after it receives a beacon. Therefore I called LMIC_enableTracking after my device joined. I tested 0 and 5 (I don't understand exactly what the parameter means even after reading the documentation) as parameter, but the result was always a EV_SCAN_TIMEOUT. I also tried to call LMIC_setPingable(2) after join, but with the same result.

My issue is now that I don't know where to look at. Which components in the stack are crucial for class B support? Is my loraserver miss configured? Or is my use of the lmic library wrong? Who can help and maybe provide an example?

terrillmoore commented 4 years ago

HI @sualko -- thanks for your note.

Please check the pdf docs (in the doc directory) for LMIC_enableTracking(). There are two cases:

Second, it's very possible that the parameters in the LMIC are wrong -- check the data in lorabase.h and lorabase_eu868.h to confirm that things are rign.

Third, you really want to call LMIC_setPingable(1) after join, because that's what really accomplishes the transition to class B.

Fourth, hack up the compliance sketch, turn on event logging (so that LMICOS_logEvent etc actually are compiled in), and check how RX events are being handled.

It's very likely that the LMIC has an obsolete idea of the beacon format -- when I was originally reviewing this, I found mismatches and errors in the LoRaWAN spec, which have subsequently been corrected in 1.0.3 -- that's what you should use as a reference.

Good luck!

sualko commented 4 years ago

thanks for your note.

thanks for your quick and helpful response. With your clues I was able to receive a beacon and to start the frequent rx windows from class B. As you stated the frequency was wrong and only with a dirty fix I was able to get it right. Will work further on it, because it's quite unstable and I only receive beacons sporadic. Also the class B bit is not set in the uplink message, and therefore the server is still waiting for an uplink message (class A mode). If it's ok for you, I will report my progress.

Btw I think the parameter for LMIC_enableTracking is deprecated, because the BeaconTimingReq is deprecated. Or did I miss something?

terrillmoore commented 4 years ago

That's great that you're getting beacons at all.

There's a replacement for BeaconTimingReq, which we should be using in its place: DeviceTimeReq. But it's just an optimization; it's totally OK to start by scanning for a beacon without sending any uplinks at all.

Are you calling LMIC_setPingable(1)? That's what completes the conversion of the device into a Class B device. Then at line 1890 or so:

    LMIC.frame[OFF_DAT_FCT] = (LMIC.dnConf | LMIC.adrEnabled
                              | (sendAdrAckReq() ? FCT_ADRACKReq : 0)
                              | (end-OFF_DAT_OPTS));

Add here the setting of bit 4 (FCT_CLASSB), if OP_PINGABLE is set. Something like:

    LMIC.frame[OFF_DAT_FCT] = (LMIC.dnConf | LMIC.adrEnabled
                              | (sendAdrAckReq() ? FCT_ADRACKReq : 0)
#if !defined(DISABLE_PING)
                             | (LMIC.opmode & OP_PINGABLE) ? FCT_CLASSB : 0
#endif // ndef DISABLE_PING
                              | (end-OFF_DAT_OPTS));
sualko commented 4 years ago

There's a replacement for BeaconTimingReq, which we should be using in its place: DeviceTimeReq.

I tried the DeviceTimeReq, but it didn't work in a quick test. Not sure why, but it's also not that important to me. One other think I recognized was that the PingSlotChannelRes is malformed and therefore the server sends a new request if he receives a response and so on and so on. I disabled it via DISABLE_MCMD_PingSlotChannelReq for now.

Are you calling LMIC_setPingable(1)?

I call LMIC_setPingable(2), because I have my server configured to a 4 seconds interval. The receive slots are also opened, but as far as I can tell the server is not aware of that.

Add here the setting of bit 4

That worked like a charm. The uplink packet is now marked as class b. Receiving is still not working, but this could also be related to the loraserver. I have to lookup how this is normally be done, because the loraserver still waits for the next uplink message. Once again, thanks for your help.

P.S.: I think I know why the loraserver is not starting to send in the ping slots. It's in the lora specification section 14.3:

Once the NS has sent the first PingSlotChannelReq command, it SHOULD resend it until it receives a PingSlotChannelAns from the device. It MUST NOT attempt to use a class B ping slot until it receives the PingSlotChannelAns.

At least I know now where I have to look.

terrillmoore commented 4 years ago

Great. It should be easy to fix the PingSlotChannelAns uplink. Since I've not tested at all, and I rewrote the mac message code, it's very likely that I messed it up. But it should be easier to find and fix in the new code.

sualko commented 4 years ago

Short update: The beacon search is still not reliable, but that's not the biggest issue at the moment. With my/your fixes from #481, the PingSlotChannelRes is also working sometimes. I have no idea why, but sometimes the PingSlotChannelRes is send as payload to port 17 (which is the mac code of the response). Do you have an idea, why this happens? Where should I look at?

terrillmoore commented 4 years ago

Can't tell from your comment if you're running into this, but:

In LoRaWAN, MAC commands can travel in one of two ways.

Uplink MAC messages will often piggyback on messages sent to other ports. The network is free to decide whether to piggyback the downlink MAC messages. It looks like your network is using the same port 17 for PingSlotInfoAns as the PingSlotInfoReq used for piggyback on the uplink. Totally legal. The LMIC will parse it and act on it either way. You will get a zero-length message for port 17 in the app (or you'll get a valid port 17 message, depending). In either case, the answer has been processed. If you need to detect the presence of the mac message in your app, you can compare your message pointer to the base of the message buffer. If not equal, then there was a mac message, which has been removed.

sualko commented 4 years ago

It looks like your network is using the same port 17 for PingSlotInfoAns as the PingSlotInfoReq used for piggyback on the uplink.

My application is only using port 1. What I get from lmic is something like that:

{
  "mhdr": {
    "mType": "UnconfirmedDataUp",
    "major": "LoRaWANR1"
  },
  "macPayload": {
    "fhdr": {
      "devAddr": "0044ece9",
      "fCtrl": {
        "adr": false,
        "adrAckReq": false,
        "ack": false,
        "fPending": true,
        "classB": true
      },
      "fCnt": 4,
      "fOpts": null
    },
    "fPort": 17,
    "frmPayload": [
      {
        "bytes": "Aw=="
      }
    ]
  },
  "mic": "ee9fd4f3"
}

As you can see, 17 is the mac cmd for PingSlotInfoAns and Aw== is 0x3, which means frequency and dr are processed. I don't think that mac commands should be transferred as payload, or am I wrong?

terrillmoore commented 4 years ago

I need to see the actual data before decoding. The LMIC often sends a null packet with FMAC piggybacked. That looks like what's happened here. The FHDR.FCtrl.FOptsLen field (bits 3..0 of FCtrl) is not represented here; I suspect your decoder is confusing this, because this is really a NULL uplink. THe "fOpts" NULL field doesn't conclusively prove that the field is zero on the uplink. Need to see the data before decoding, after decryption.

sualko commented 4 years ago

Need to see the data before decoding, after decryption.

I have no idea how to get it, but I will try my best. Nonetheless I don't think it's an issue with an null uplink, because this packet was send instead of my application data. The flow is like this (most of the time, which I think is also an indication for an invalid memory access):

  1. Join request
  2. Join accept
  3. App data is send
  4. PingSlotInfoReq is incoming
  5. Want to send an uplink after 60 seconds and instead the packet above was send
  6. PingSlotInfoReq is incoming

I will try to get more information this evening.

terrillmoore commented 4 years ago

Couple of corrections.

  1. If this is really code 17 (0x11), it's PingSlotChannelAns, rather than PingSlotInfoAns.
  2. This can only happen if a PingSlotChannelReq happened on a previous downlink.
  3. The LMIC will load the PingSloteChannelAns into the mac response buffer
  4. After doing all other downlink processing, it will trigger a poll to do a (class A) uplink with a null message body.
  5. transmitting the message will take some time; during this time, any attempt made to transmit will return a "busy" error.

Since this pattern is used for all MAC messages, and it's working for all others, I suspect a combination of a decoder problem and an application problem.

sualko commented 4 years ago

If this is really code 17 (0x11), it's PingSlotChannelAns, rather than PingSlotInfoAns.

I'm so sorry. I was always taking about PingSlotChannelReq/Res. I was able to get the data on the gateway and to decode it via lora-packet. Hopefully this is the PingSlotChannelReq and PingSlotChannelRes.

JSON down:

{"txpk":{"imme":false,"rfch":0,"powe":14,"ant":0,"brd":0,"tmst":110634123,"freq":868.1,"modu":"LORA","datr":"SF7BW125","codr":"4/5","ipol":true,"size":17,"data":"YEe3sgGFDQARAAAAAH9Tu+A="}}

Decoded down:

Message Type = Data
            PHYPayload = 6047B7B201850D0011000000007F53BBE0

          ( PHYPayload = MHDR[1] | MACPayload[..] | MIC[4] )
                  MHDR = 60
            MACPayload = 47B7B201850D001100000000
                   MIC = 7F53BBE0 (OK)

          ( MACPayload = FHDR | FPort | FRMPayload )
                  FHDR = 47B7B201850D001100000000
                 FPort = 
            FRMPayload = 

                ( FHDR = DevAddr[4] | FCtrl[1] | FCnt[2] | FOpts[0..15] )
               DevAddr = 01B2B747 (Big Endian)
                 FCtrl = 85
                  FCnt = 000D (Big Endian)
                 FOpts = 1100000000

          Message Type = Unconfirmed Data Down
             Direction = down
                  FCnt = 13
             FCtrl.ACK = false
             FCtrl.ADR = true

JSON up

{"rxpk":[{"tmst":114408627,"time":"2019-10-22T19:41:50.517730Z","tmms":1255808529517,"chan":1,"rfch":1,"freq":868.300000,"stat":1,"modu":"LORA","datr":"SF7BW125","codr":"4/5","lsnr":9.5,"rssi":-41,"size":14,"data":"QEe3sgEQDgARA8Bp/Ek="}]}

Decoded up

Message Type = Data
            PHYPayload = 4047B7B201100E001103C069FC49

          ( PHYPayload = MHDR[1] | MACPayload[..] | MIC[4] )
                  MHDR = 40
            MACPayload = 47B7B201100E001103
                   MIC = C069FC49 (OK)

          ( MACPayload = FHDR | FPort | FRMPayload )
                  FHDR = 47B7B201100E00
                 FPort = 11
            FRMPayload = 03
             Plaintext = 5F ('_')

                ( FHDR = DevAddr[4] | FCtrl[1] | FCnt[2] | FOpts[0..15] )
               DevAddr = 01B2B747 (Big Endian)
                 FCtrl = 10
                  FCnt = 000E (Big Endian)
                 FOpts = 

          Message Type = Unconfirmed Data Up
             Direction = up
                  FCnt = 14
             FCtrl.ACK = false
             FCtrl.ADR = false

Btw. which lorawan revision does lmic implement? Because I can configure this in the loraserver.

terrillmoore commented 4 years ago

which lorawan revision does lmic implement?

I use 1.0.3 for reference. Certainly, for class B, that's what we should follow, as 1.0.2 spec is broken.

terrillmoore commented 4 years ago

I see that the downlink is properly formatted, and the uplink is not. Since this is common code, and the other commands work, it's puzzling. According to the 1.0.3 spec, section 14.3, this can only be sent in a Class A response.

The downlink message was 0x11,0,0,0,0, which means "data rate 0, on the default ping frequency" -- or SF12 in Europe.

The response 0x11, 0x03, indicates that the the LMIC accepted the message.

The response is written to a special buffer by put_mac_uplink_byte(). LMIC.pendMacData[] is the buffer, LMIC.pendMacLen is the overall length.

The overall length is 2.

I see the problem. It's operator precedence in the patch for the classB bit in buildDataFrame().

Change:

#if !defined(DISABLE_PING)
                             | (LMIC.opmode & OP_PINGABLE) ? FCT_CLASSB : 0
#endif // ndef DISABLE_PING

to

#if !defined(DISABLE_PING)
                             | ((LMIC.opmode & OP_PINGABLE) ? FCT_CLASSB : 0)
#endif // ndef DISABLE_PING

(i.e., add one extra set of parentheses)

sualko commented 4 years ago

It's operator precedence in the patch for the classB bit in buildDataFrame().

You did it. The PingSlotChannelReq/Res is working fine now. :tada: Thanks a lot :clap:

sualko commented 4 years ago

I'm still struggling with the beacon scan. It feels like a lottery win if I find a beacon. Therefore I'm currently looking at all those config options and specifications. In the regional parameters, I found this entry for (I think) all countries:

| Signal polarity | Non-inverted | As opposed to normal downlink traffic which uses inverted signal polarity

LMIC.noRXIQinversion controls the signal polarity (right?), but it's never used in the code. After putting it to setBcnRxParams, it feels like I have more luck than without. Could I be on the right track?

terrillmoore commented 4 years ago

LMIC.noRXIQinversion is used in radio.c, and in the various raw sketches.

You don't want to set that. We already invert the IQ via the setIh() call eg at lmic_eu868.c line 240, and that takes precedence. Anyway, you'd never see a beacon if it were wrong.

Not receiving the beacon sometimes is probably an issue with the CRC check, which was demented in 1.0.2 and fixed (I think) in 1.0.3. You might start by printing out whether you're getting to decodeBeacon() more often than you get successful results, and or check the answer. I "fixed" this code -- added error codes, restructured and clarified; so it might well be a simple error. I'd step through it with an ST-LINK or similar and convince myself that it was working correctly. The CRC calculation was pretty crazy on 1.0.2, but I think it's fixed in 1.0.3, and it's possible that I applied those fixes.

sualko commented 4 years ago

Anyway, you'd never see a beacon if it were wrong.

:hankey: thought I found it

you might start by printing out whether you're getting to decodeBeacon() more often than you get successful results

I already did that in onBcnRx and it seams that there is simply no data received. My next idea is to create a lora sniffer to see if the gateway is actually sending some data, but I'm currently busy and don't know when I have time to set it up.

olicooper commented 3 years ago

I am also not able to get Class B working with TheThingsNetwork with the default library configuration. Implementing lmic.c changes in https://github.com/mcci-catena/arduino-lmic/issues/479#issuecomment-543756628 helped to fix the issue for me... Will this update be merged soon?

Note: I checked the TTN Network Server configuration default ping frequency (see here) and ping data rate (see here) against the values in lorabase.h and they are identical. I call LMIC_setPingable(4) inside Arduino_LoRaWAN_ttn::NetJoin().

The TTN uplink data message (received after the join request) contains the following data - requesting the transition to class B:

"mac_payload": {
        "f_hdr": {
          "dev_addr": "FFFFFFFF",
          "f_ctrl": {
            "class_b": true
          }
        },
        "f_port": 1,
        "frm_payload": "VA=="
      }
terrillmoore commented 3 years ago

Implementing lmic.c changes in #479 (comment) helped to fix the issue for me...

Not sure what you're referring to? There was a pull request #481 which is still marked WIP as I had no way to test class B other than with a network simulator, and never saw confirmation that it actually works. Or did you only apply the fix for the parenthesis problem?

olicooper commented 3 years ago

@terrillmoore I have implemented all the changes found in https://github.com/mcci-catena/arduino-lmic/pull/481 (including the parenthesis fix) and LMIC now upgrades the connection to class B. However, every uplink is followed by a downlink (see below) and also a "Device status request enqueued" in the TTN console. TTN only allows 10 downlinks a day so this is a big problem for me.

"payload": {
      "m_hdr": {
        "m_type": "UNCONFIRMED_DOWN"
      },
      "mic": "Ba3hTg==",
      "mac_payload": {
        "f_hdr": {
          "dev_addr": "<device_addr>",
          "f_ctrl": {
            "adr": true
          },
          "f_cnt": 6,
          "f_opts": "Bg=="
        },
        "full_f_cnt": 6
      }
    },
    "request": {
      "downlink_paths": [
        {
          "uplink_token": "<token>"
        }
      ],
      "rx1_delay": 5,
      "rx1_data_rate_index": 5,
      "rx1_frequency": "867900000",
      "rx2_data_rate_index": 3,
      "rx2_frequency": "869525000",
      "priority": "HIGHEST",
      "frequency_plan_id": "EU_863_870_TTN",
      "lorawan_phy_version": "PHY_V1_0_3_REV_A"
    }
}

Logs from the device:

55005436: engineUpdate, opmode=0xd08
2021-09-07T10:53:55 [V][lora]: EV_TXSTART: Ch=7 rps=0x01 (SF7 BW125 CR4/5 Crc IH=0)
55005577: TXMODE, freq=867900000, len=14, SF=7, BW=125, CR=4/5, IH=0
2021-09-07T10:53:55 [V][lora]: Publishing: BasicState [C4]
55319241: setupRx1 txrxFlags 0x20 --> 01
start single rx: now-rxtime: 4
55319809: RXMODE_SINGLE, freq=867900000, SF=7, BW=125, CR=4/5, IH=0
rxtimeout: entry: 55322928 rxtime: 55319802 entry-rxtime: 3126 now-entry: 317 rxtime-txend: 311126
55381491: setupRx2 txrxFlags 0x1 --> 02
start single rx: now-rxtime: 4
55382059: RXMODE_SINGLE, freq=869525000, SF=9, BW=125, CR=4/5, IH=0
rxtimeout: entry: 55387053 rxtime: 55382052 entry-rxtime: 5001 now-entry: 317 rxtime-txend: 373376
55387384: processRx2DnData txrxFlags 0x2 --> 00
55387490: processDnData_norx txrxFlags 00 --> 20
2021-09-07T10:54:01 [V][lora]: NetTxComplete
2021-09-07T10:54:01 [D][lora]: Published: BasicState
55388300: engineUpdate, opmode=0xd00

I don't know enough about lorawan to understand what this means.

terrillmoore commented 3 years ago

Unfortunately, I don't really have time to look into this; I would need to use the network simulator and I continue to be under the gun for other things.

I am unfamiliar with how TTN implements Class B support. The device log you show is for a class A uplink, it doesn't show the downlink. TTN class A will send network-level downlinks autonomously in response to uplinks. With all the print outs on the device, it wouldn't surprise me if LMIC timing is somewhat perturbed by the logs, but you're not receiving the class A downlinks, so TTN will continue to retry sending the MAC downlinks.

Sorry I'm not much more help.

--Tery

sualko commented 3 years ago

I would also love to see a working class B implementation, but it was hard for me to debug too.

However, every uplink is followed by a downlink

Can you post a larger part of your log including the part of the class upgrade? Nonetheless I'm not sure if I can help.

You should maybe look into the following comments to further track the issue down: https://github.com/mcci-catena/arduino-lmic/pull/481/files#r337771790 and https://github.com/mcci-catena/arduino-lmic/pull/481/files#r337774897

MaxTrev commented 1 year ago

Hi @sualko Just wondering how you got on with this? I'd be really keen to get ClassB up and running... and not getting very far!

Jesusdiaz31497 commented 1 year ago

can you help me?