timmbogner / Farm-Data-Relay-System

A system that uses ESP-NOW, LoRa, and other protocols to transport sensor data in remote areas without relying on WiFi.
MIT License
507 stars 114 forks source link

[NEW FEATURE] Packet Confirmation and System Packets #54

Closed timmbogner closed 2 years ago

timmbogner commented 2 years ago

First off, I want to extend my deep appreciation to everyone who has helped me with this project. I'm so glad that everyone is having as much fun with FDRS as I am!

The first of the goals that Andreas mentioned for FDRS is packet confirmation.

Packet confirmation is already done within the ESP-NOW protocol to some degree, and you can use the send callback to confirm whether or not it was delivered. From what I read, the protocol actually does some quick retries if no confirmation is received. However, because we will want to keep our methods uniform with LoRa, which has no confirmation, I propose we create a new protocol for confirming packets on both LoRa and ESP-NOW.

Something I hold sacred in FDRS is that the messages are sent as either a single object or an array of objects (DataReadings). Because this is the only expected packet format, adding another packet type brings on the question: how does the gateway know which is which? My favorite solution: the system packets are smaller, and thus if LoRa or ESP-NOW receive a packet smaller than a DataReading, it knows that it is dealing with a system packet. This of course limits you to single system packets at a time. A DataReading is 7 bytes, thus I propose the new packet be either 6 or 4. The packet will already know its size and sender, so it doesn't need to convey either of those.

The system packet will be sent after a message is received by a device. It will send some kind of token back to the sender that proves it received the message correctly. The sending device will be waiting (non-blockingly) to receive that proof, and retrying at intervals if not.

So the main question: what is the proof? I'm hoping someone can just tell me whether to use CRC or checksum for the token.

This may also be a good time to have a discussion about encryption/security, which will be another use of the new packet. This will not be pertinent until two-way communication is ironed out, but it's good to keep in mind. The system packet will also be used by sensors for pinging and pairing with gateways, as well as perhaps get/setting configuration parameters.

All thoughts, ideas, and proposals are welcome and appreciated!

Devilbinder commented 2 years ago

For data acknowledgement. A message type data field will be required. To tell the device how interpret the data. A message ID will also be required. Some simple counter will do. You don't need "proof". All your device need to know is: I sent a message with ID x and have I gotten a response about that ID. Another approach is to have the final endpoint query the sensors but this does not scale to large networks.

CRC and checksum are used for data integrity checks not for data acknowledgement. They get appended at the end of a data frame,

CRC are superiors with the tradeoff being that is more computationally expensive unless you are using hardware accelerators.

Here is how to do a CRC. https://github.com/4-20ma/ModbusMaster/blob/3a05ff87677a9bdd8e027d6906dc05ca15ca8ade/src/util/crc16.h#L71

Here is how it is used. https://github.com/4-20ma/ModbusMaster/blob/3a05ff87677a9bdd8e027d6906dc05ca15ca8ade/src/ModbusMaster.cpp#L695

Your data structure will look like this.

typedef union __attribute__((packed)) DataReading_t {

    __attribute__((packed)) struct{
        float data;
        uint16_t id;
        uint8_t type;
        uint16_t crc16;
    }
    __attribute__((packed)) struct{
        uint8_t serial_data[7];
        uint16_t crc;
    }
    __attribute__((packed)) struct{
        uint8_t serial_payload[9];
    }
} DataReading_t;

Encryption and Security

Rule No.1: Don't roll your own crypto, bro.

Use AES128 or AES256 algorithms. MbedTLS is the defacto standard and is include with the ESP32.

Also requires true/secure random number generators witch kicks out 90% of the popular MCUs that are being used with Arduino.

timmbogner commented 2 years ago

I'll address message types when I get home in a few hours, but I wanted to quickly mention that I do want to also confirm message integrity, because LoRa likes to mess up a character here and there.

timmbogner commented 2 years ago

I'm really not interested in expanding size of the the FDRS packet structure, and I don't think we need to yet. A big value currently is the fact that the gateway never needs to look inside of a packet (the data portion anyway). This reduces complexity in my opinion, also keeps things light. I'm already looking at ways to reduce packet sizes for LoRa.

A message type data field will be required. To tell the device how interpret the data. A message ID will also be required.

The message type is inferred by the fact that it is a message that is smaller than the normal data packet size. The ID or "proof" will be ideally something unique that verifies both the packet delivery and integrity (CRC/checksum). Our device will get a CRC for the packet before it sends, then wait for the other device to send back the same CRC. Now we know the packet arrived all in one piece.

Rule No.1: Don't roll your own crypto, bro.

If you mean don't try to invent my own encryption then I say.... oh yeah most definitely not. We are in agreement there 👍

timmbogner commented 2 years ago

I just did more research, and it is a checksum that will fit this use-case. If I understand right, a CRC always involves some data being added to the packet. A checksum should be an algorithm that we can run both the original and received packet through and expect to get the same value. 4 bytes, ideally? The confirmation packet arriving unobscured is an extra (unavoidable) failsafe.

thefeiter commented 2 years ago

Great idea!! I am having some weird values in my logs. This would help to sort them out! Screenshot_20220711_133240

thefeiter commented 2 years ago

Sensor: sendLoRa()-> wait for acknowledge from gateway, if not recieved within x milliseconds, resend package (retry Y times) Gateway: getLora() -> check checksum, if valid sendLoRaAck() and process data, else drop packet

Is this what you are going for?

thefeiter commented 2 years ago

Do you know the LoRa.enableCrc()? I am enabling it in my test setup now to see, if this maybe at least filters out corrupted packets.

timmbogner commented 2 years ago

Yeah, you're close. The sensor gets the checksum of the packet before it sends. ACK comes when it receives the same checksum back from the device it just sent to. I forgot about the LoRa CRC thing...I'm going to look into that as well.

timmbogner commented 2 years ago

I just found a great resource

Gulpman commented 2 years ago

I'll address message types when I get home in a few hours, but I wanted to quickly mention that I do want to also confirm message integrity, because LoRa likes to mess up a character here and there.

Talking about data integrity: It may be worth the time to think about if float values are the right format for sending sensor values via a network protocol. Floating point numbers are not exact so it may occure that a value you want to store is not stored correcly as intended. See here as a reference: https://stackoverflow.com/questions/21895756/why-are-floating-point-numbers-inaccurate So doing a crc (or any other checksum) on floats could be a little bit challenging - just a guess / no knowledge!

One possibility, which also will reduce the payload being sent via the network, is to transform floats to integers before sending and decode them on the receiver side. I've stumbled upon it a few days ago: https://www.aeq-web.com/so-funktioniert-die-lora-payload-encoder-decoder-ttn/ Iin German unfortunately but search for "Beispiel Wetterstation" to have a look at the charts which make the process pretty obvious.

A quick walk-through on the temerature line of the example: Temperatures are assumed to be between -30.00 - 60.00C. Multiply by 100 to get integers / get rid of anything behind the decimal point. To get rid of negative values add a constant (here 5000) which ensures your range of values is guaranteed positive. Now transfer this positive integer over the net and decode on the receiver side. Instead of 4 Bytes this can be encoded in 2 Bytes (and many smaller values could be encoded in 1 Bytes). And maybe this helps in reducing corrupt data packages.

Gulpman commented 2 years ago

As far as I remember the SX126x/SX127x chips are only operating in half-duplex mode. That being said, when sending a package back from the gateway to a sensor it is "blind" to receive any other message. This may become tricky in an implementation.

timmbogner commented 2 years ago

Replacing the float has been a long-term goal, but they're just so convenient. It's been off my radar for a bit, so let me think about it.

The LoRa shortcoming has been in my head for a while. I've considered making a warning about over-use of LoRa, but never did. As you mentioned it just now, though, it occurred to me that you could probably connect two LoRa gateways via serial similar to the MQTT setup and have one send and the other receive. I'm still working that out in my head, but it would be fun!

Gulpman commented 2 years ago

Replacing the float has been a long-term goal, but they're just so convenient. It's been off my radar for a bit, so let me think about it.

You wouldn't loose convenience as the encoding / decoding is triggered automatically when data is transferred via the network. For people working on the code nothing changes, they are still using floats - it could become tricky if there is an application where someone needs very big numbers + many, many digits behind the comma... But then it can becomes tricky with floats themselves as well.

GUVWAF commented 2 years ago

The LoRa shortcoming has been in my head for a while. I've considered making a warning about over-use of LoRa, but never did. As you mentioned it just now, though, it occurred to me that you could probably connect two LoRa gateways via serial similar to the MQTT setup and have one send and the other receive. I'm still working that out in my head, but it would be fun!

Unless you will use a different frequency for sending and receiving (which makes the overall implementation way more complicated), this will likely not work as the packet you send will interfere with the packet you are receiving.

You can greatly increase the reliability using Channel Activity Detection (CAD), which is usually available in LoRa chipsets. In this way, you can wait until the channel is idle before transmitting. Then in order not to let multiple devices transmit exactly at this point in time, you could use a random delay. This is used in Meshtastic, next to some ACK handling, so you might get some ideas from there.

timmbogner commented 2 years ago

this will likely not work as the packet you send will interfere with the packet you are receiving.

HmmmmmYup. Good point. Brain fart!

This is used in Meshtastic, next to some ACK handling, so you might get some ideas from there.

I'll have to check it out, I haven't looked at Meshtastic in a while. Thanks!

I'm starting to see we'll need need to focus on LoRa confirmation first, and then either copy it over to ESP-NOW or adapt ESP-NOW's pre-existing confirmation to our needs (would be simpler perhaps).

Also: I said we wouldn't extend the sizeof DataReading, but I think it would be appropriate to add a byte if needed somewhere. If I'm not mistaken: an 8-byte structure would be tidier, and I think then we could remove the "packed" attribute.

thefeiter commented 2 years ago

Do you know the LoRa.enableCrc()?

Since I enabled this on all devices, I have not seen any corrupted messages. But I think the corrupted ones are just getting dropped. If this does not matter in the Setup one uses, then this should be fine. But if you are sending more critical data further packet confirmation should be implemented.

I would add a new config option like: #define LORA_CRC_ENABLE // when enabled corrupted LoRa packets are beeing identified and dropped, this prevents wrong sensor values being transmitted

aviateur17 commented 2 years ago

I think there are a couple of issues with the CRC function. I would rather add 2 bytes CRC to each packet and verify CRC in software. I will put my thoughts on implementing the code in a future comment.

Based on reading the LoRa github repo issues related to CRC here's what I've found:

  1. Very transparent - no indication of any issues when CRC fails. Would be nice to have some feedback about when there are issues.
  2. CRC doesn't work on the receiver in explicit header mode. Maybe this is irrelevant as I haven't read into implicit vs explict LoRa headers
  3. CRC seems to use interrupts on the DIO pins. If those pins are not hooked up or interrupts are disabled for some reason I don't know that CRC would work.
aviateur17 commented 2 years ago
  1. Add calculateCRC() function in fdrs_functions.h - example was in the modbus github repo mentioned earlier
  2. modify DataReading struct to support 2 byte CRC
  3. add functionality in getLoRa() to calculate checksum of packet and confirm with CRC portion of DataReading
  4. If CRC match then send ACK back to sender - create DataReading packet to transmitLoRA() in fdrs_functions.h
  5. if CRC does not match then send NAK to sender to force sender to resend the packet. This could be faster than ACK timeout on sensor.
  6. On sensor code if no ACK is received within timeout or NAK is received then resend packet a certain amount of times before giving up.

Questions:

  1. Do we need a new datatype in DataReading to support LoRa ACK type?
  2. Might as well verify CRC of ACK packet in sensor in principle but not really necessary for operation.
  3. Is a delay needed between when the LoRa packet is sent from the sensor to an ACK able to be received by the sensor - testing should be able to verify this.

I'm going to be on Holiday for a week pretty soon so if someone doesn't have this done before then I'll spend some time on it when I get back from Holiday if I don't have something before then.

thefeiter commented 2 years ago

It just came to my mind that if you want to be power efficient on battery powered sensors, you would not want to have the sensor wait for ACK or NAK because this would drain the battery faster. It is important, that the sensor can go to deep sleep as fast as possible after transmitting a packet. Resending multiple times until giving up would take much more power.

aviateur17 commented 2 years ago

It just came to my mind that if you want to be power efficient on battery powered sensors, you would not want to have the sensor wait for ACK or NAK because this would drain the battery faster. It is important, that the sensor can go to deep sleep as fast as possible after transmitting a packet. Resending multiple times until giving up would take much more power.

Yeah, we can make ACK optional and disabled by default - will require Opt In. Compare runtime of sensor with and without ACK to extrapolate the power used in both situations. When ACK is disabled CRC is some known value (maybe 0xFFFF) so that the gateway knows not to reply to the sensor.

aviateur17 commented 2 years ago

In the LoRa packets we are using 3 bytes for Gateway address (destination) and 2 bytes as LoRa sensor address (source). but then we reverse the direction and the sensor becomes the destination and gateway the source of the return packet. The source and destination addresses should be same number of bytes as then we would need to pad the LoRa address by one byte to make it the destination. I'm inclined to change the code so both the source and destination are 2 bytes each. Should be sufficient. In fact it seems only the last byte of the gateway address is really exposed for end user to change so 2 bytes should be sufficient from what I see.

Also, since we are starting to send data back and forth I would rather call stuff source and destination instead of gateway and LoRAMAC since the gateway and Lora node can be either the source or destination now. This should make the code a bit more reusable in both the gateway and sensor node.

timmbogner commented 2 years ago

I'm letting this marinate and researching LoRa this weekend. I think I'm open to changing the packet format a little bit now. I have always liked sending things over ESP-NOW as a single DataReading or an array of them. With nothing else in the packet, the user was still able to send ESP-NOW packets without using my sensor sketch. This kept everything clean and convenient. I'm now taking stock of what's important, and that isn't a feature anymore.

The project actually started with a packet structure that was named DataPacket, but it just made up for the fact that I didn't yet understand how arrays were stored in memory.😁

With that said, there will be a future need for new system packet. It will be used for lower-level communications between devices like ping, discovery, pairing, configuration, etc. If we are designing a packet structure then we don't need to differentiate the packet types by size like I'd said... which is probably good, though less creative 🙂.

aviateur17 commented 2 years ago

@timmbogner, I don't see that the structure of DataReading needs to change. I'm only saying in the send or receive LoRa functions there is a mismatch between the number of address bytes used as source and destination and thus swapping the source and destination (return packet) makes the mismatch of the number in bytes confusing. I'm looking at the ESP NOW functions and I don't think changing LoRa source and destination addressing would have any effect on the ESP NOW code.

aviateur17 commented 2 years ago

Getting close to completing this. CRC isn't matching between sender and receiver and it has something to do with the first and last byte. Not sure I'll be able to send a PR before I leave for my vacation.

Sender LoRa packet

A0 05 
EE 00 
42 A8 
46 A8 
41 22 
00 03 
A8 46 
28 42 
22 00 
01 20 

Receiver LoRa packet:

80 05 
EE 00 
42 A8 
46 A8 
41 22 
00 03 
A8 46 
28 42 
22 00 
01 F1

First two bytes supposed to be EE05 - MAC address of Gateway Second two bytes supposed to be 4200 - MAC address of LoRa node Last two bytes are CRC

Total bytes = 6 for source and dest and CRC + 14 for FDRS data = 20 bytes total

Something going on with the first byte and the last byte. Wondering if I pad it a little with some extra bytes at the beginning or end or something with LoRA timing.

timmbogner commented 2 years ago

I understand now. We had both posted at the same time, so I hadn't read your previous one when I submitted my last post. If you run out of time you can submit what you have and we can look at and work on it while you're gone. Much appreciated!

timmbogner commented 2 years ago

The addressing is sort of cryptic. The first byte of the three bytes was meant to be changed if you had multiple systems within range of each other. I never got around to documenting that. I may add it back later, but we'll switch to two for now.

aviateur17 commented 2 years ago

I will try to upload what I have before I leave. If I can't get it neatly into a PR I will upload the raw files.

aviateur17 commented 2 years ago

An update on my work: I now have the CRC calculation and ACKs working properly between gateway and LoRa sensor node. The memcpy function seems to swap bytes when using to copy multiple bytes which was causing the majority of my issues. Instead of using memcpy I use some bit operators. My remaining work is to code when sensor node does not want ACKs and code fixed number of sensor node replies when invalid CRC is received from gateway or when gateway does not respond within a certain time frame. That stuff should be pretty easy to code relative to the previous work done. Should have a complete PR within 48 hours. I see there has been a bunch of other changes that affect the code of this PR as well.

timmbogner commented 2 years ago

This is looking awesome! The only thing to mention is that the ACKs should not be sent as DataReadings, they're gonna be the new SystemPacket type. You can add an entry to the new enum for "cmd_ack". I'm going to merge my "big change" branch to dev shortly, and then you should be able to fix the conflicts and get it up-to-date.

aviateur17 commented 2 years ago

The only thing to mention is that the ACKs should not be sent as DataReadings, they're gonna be the new SystemPacket type.

Gotcha, I'll change from DataReading type to SystemPacket type for the LoRa ACK packet.

aviateur17 commented 2 years ago

I believe my changes are complete. I'll create a PR in the morning. I'll probably end up ditching the previous one and creating a new one due to the merge conflicts on the previous PR.

Gateway:

1626754: Incoming LoRa. Size: 20 Bytes, RSSI: -44dBi, SNR: 9.50dB, PacketCRC: 0xd549, Total LoRa received: 293, CRC Ok Pct 100.00%
1626754: CRC Match, sending ACK packet to sensor 0x4200(hex)
1626760: Transmitting LoRa message of size 11 bytes with CRC 0x463 to LoRa MAC 0x4200
1626809: Sending MQTT.

Sensor:

607257: Data loaded. Type: 3
607257: Data loaded. Type: 1
607257: Sending FDRS Packet!
607257: Transmitting LoRa message of size 20 bytes with CRC 0xd549 to gateway 0xee05. Retries remaining: 2, CRC OK 95.16%
607374: Incoming LoRa. Size: 11 Bytes, RSSI: -27dBi, SNR: 10.00dB, PacketCRC: 0x463
607374: ACK Received - CRC Match
607374:  Delaying.
aviateur17 commented 2 years ago

LoRa Ack documentation: FDRS now features ability for sensor nodes using LoRa to request acknowledgement packets for data sent to gateway via LoRa protocol. LoRa packet CRC is calculated when ACKs are enabled and ACK messages will include information about the CRC of received data - if is correct or not in the ACK reply.

  1. By default, ACK in LoRa sensor node is disabled to save battery. Can be enabled by uncommenting #define LORA_ACK in fdrs_sensor_config.h
  2. When ACK is disabled in sensor, a CRC is calculated and sent to the gateway. Based on the way the CRC is calculated the gateway will know that the sensor is not expecting an ACK. If the received packet has incorrect CRC the gateway will not know that the sensor does not want an ACK and the gateway will send a BAD_CRC reply via ACK for limited number of retries while the sensor is sleeping. The data from the sensor will be lost.
  3. When ACK is enabled CRC of packet is sent along with the data and CRC is verified at the gateway. If CRC is correct then the gateway sends SystemPacket with ACK_OK back to the sensor. If CRC of received packet is incorrect then gateway sends CRC_BAD SystemPacket to sensor and sensor will resend the packet with limited number of retries.
  4. When ACK is enabled but sensor does not receive ACK packet within predefined timeout (default 400ms) then the sensor will retry sending packets to gateway for fixed number of retries (defaults to 2 retries). After retries the data is lost as it is not stored in the sensor at this time. Timeout and retries can be changed in fdrs_sensor_config.h.

At this time ACK feature is controlled at compile time. Perhaps at a later date this can be changed to runtime via SystemPackets.

Edit note: Corrected the above for the latest updates as of 7/27/2022