timmbogner / Farm-Data-Relay-System

A system that uses ESP-NOW, LoRa, and other protocols to transport sensor data in remote areas without relying on WiFi.
MIT License
485 stars 108 forks source link

FDRS Support for timekeeping - 2024 version #193

Closed aviateur17 closed 1 month ago

aviateur17 commented 4 months ago

Add gateway time keeping, NTP, RTC support

For Gateways, add time keeping in all nodes - MQTT, serial, ESP-NOW, LoRa.
Add support to fetch time via NTP if WiFi is enabled.

Add support for I2C RTC devices - DS3231 and DS1307 and compatible

Gateway will only send time out via serial if WiFi or RTC is in use to prevent loop. Don't want serial node to send time back to WiFi/MQTT node.

Priority in Time master Serial > ESP-NOW > LoRA which means a gateway that is receiving time via serial will reject time via ESP-NOW and LoRA. A gateway that is receiving time via ESP-NOW will reject time via LoRa. If no time is received after 60 minutes then the "master" time server will reset to none.

Automatic transition between daylight savings time and standard time for both US and Europe.

Still need to work on support for controllers and sensors in a future merge Will also verify that compilation works in Arduino environment (I am using VSCode and PlatformIO) Also need to verify compilation against different platforms such as ESP8266, raspberry pi Pico, SAMD21. What other micro-controllers are intended to be supported?

timmbogner commented 4 months ago

Well I'll confess, the CBOR thing has more facets than I first considered. That will be on hold for at least a week while I think, fyi. I'll give this a try in a bit!

aviateur17 commented 4 months ago

I'm not sure what CBOR is. Haven't look into it yet and haven't heard the guy with a Swiss accent reference it yet.

timmbogner commented 4 months ago

From the end of the previous timekeeping PR:

An important upcoming change is going to be functionally major. I'm going to extend the 'data' portion of the DataReading structure to 5 bytes, and use CBOR to allow the data to be boolean, integer, binary, or float. This will only extend the DataReading to 8 bytes, which I think is pretty reasonable. There shouldn't be too many external changes for users. I will add new iterations of loadFDRS() for the different datatypes and then everything else will be handled internally.

It's nothing too exciting, just a data format. I'd been planning to try to hammer it out this weekend and just letting you know that this isn't happening so soon.

aviateur17 commented 4 months ago

For sensor nodes that power up, collect sensor data, send sensor data, power down, no need for them to know the date and time, right?

timmbogner commented 4 months ago

For sensor nodes that power up, collect sensor data, send sensor data, power down, no need for them to know the date and time, right?

Right. I don't see why it would be required there. If the user runs addFDRS() and makes it a controller, then would be were timekeeping comes into play I think.

I'm still reading through the new code, but I think it's great so far. I forget how it was before exactly, but the new datatypes and time master concept look like a good move.

aviateur17 commented 4 months ago

@timmbogner question for you in the scenario where we have a controller using ESP-NOW. It seems that the controller registers itself to the gateway when we call the addFDRS() function. This function is used to register a call back function in case we want to send data to the controller. However, what if we have a controller where we do not need to register a call back function? Maybe this is really the definition of a controller where addFDRS is really required but I'm thinking that maybe we decouple the registration part from the addFDRS function call. Every controller should register to the gateway even if addFDRS is not called, if I understand correctly so maybe better just to register automatically even if addFDRS is not called in main.cpp. So if you are in agreement then all addFDRS would do would register the callback function. The registration of the controller with the gateway would be called independent of addFDRS.

Additionally, if the registration with the gateway fails, I would think that we would want to periodically try to reregister with the gateway.

Then I would wonder if we have a sensor node that does not go to sleep, we should probably register the sensor with the gateway and thus the gateway would send the sensor the time which makes sense to me if the sensor is continuously powered on. What are your thoughts on that?

Finally, in the "node" code there is quite a bit of ESP-NOW and LoRa stuff in the fdrs_node.h file that I think can be moved to the fdrs_node_espnow or fdrs_node_lora files. If you agree, I think I'll take up organizing that code but in a different PR.

I think the ESP-NOW controller code is good so now to work on LoRa controller...

timmbogner commented 4 months ago

However, what if we have a controller where we do not need to register a call back function? Maybe this is really the definition of a controller where addFDRS is really required

You kind of called it right there. If a node is receiving data, it will need a callback in which to handle it. If it needs a callback then it will need to be defined somewhere. Thus, addFDRS() is essential to being a controller. In ESP-NOW it does other things too, of course.

I see now that in LoRa, addFDRS() really does very little. While it would be possible to perhaps introduce another way of setting the callback just for LoRa, I think it would be better to keep ESP-NOW and LoRa configuration and usage as similar as possible for now. I will keep thinking about it, though. # In nodes, gtwy_timeout is a variable that stores the frequency at which to refresh and retain their spot on the gateway's peer list. It is initialized to 300000 (5 mins). When nodes connect to a gateway, they receive a value to store as gtwy_timeout. addFDRS() sets is_controller to 'true', and that enables a registration to happen at the interval in gtwy_timeout. So what I'm getting at is that the controller should try to connect again after 5 minutes, as long as addFDRS() has been called. #

Then I would wonder if we have a sensor node that does not go to sleep, we should probably register the sensor with the gateway and thus the gateway would send the sensor the time which makes sense to me if the sensor is continuously powered on. What are your thoughts on that?

I think in this case, if the user wanted the sensor to get the time, they could register as a controller. I do see how it might be nice to avoid requiring a registration spot just for a clock, though. I still need to look deeper into your code, so take all of this with a grain of salt. I haven't fully wrapped my head around it at this time, but that's just because I've been distracted, not anything to do with the code itself. I'll give it attention today for sure.

Finally, in the "node" code there is quite a bit of ESP-NOW and LoRa stuff in the fdrs_node.h file

Agreed. I'm not sure if that stayed there for a reason or not, but addFDRS() can definitely move to node_espnow if there are no conflicts.

timmbogner commented 4 months ago

Jumping in to testing. I'm using Arduino IDE 2.3.0 and ESP32 board defs 2.0.14:

0_MQTT_Gateway works great, gets the time.

1_UART_Gateway with ESP-NOW gives the following error:

...\Farm-Data-Relay-System\src/fdrs_gateway_espnow.h: In function 'esp_err_t sendTimeESPNow()':
...\Farm-Data-Relay-System\src/fdrs_gateway_espnow.h:397:26: error: invalid conversion from 'uint8_t*' {aka 'unsigned char*'} to 'uint8_t' {aka 'unsigned char'} [-fpermissive]
  397 |     result1 = sendESPNow(ESPNOW1, &sys_packet);

1_UART_Gateway with only LoRa enabled gives the following error:

...\Farm-Data-Relay-System\src/fdrs_gateway_lora.h: In function 'uint32_t pingFDRSLoRa(uint16_t*, uint32_t)':
...\Farm-Data-Relay-System\src/fdrs_gateway_lora.h:653:5: error: 'pingFlagLoRa' was not declared in this scope; did you mean 'pingFDRSLoRa'?
  653 |     pingFlagLoRa = false;
aviateur17 commented 4 months ago

If the callback function was fixed at a certain name then can't we do all of that automatically without having them call the addFDRS() function? Then all we would have to do is check if the node was a controller or a sensor. Could we just call a node that doesn't define DEEP_SLEEP as a controller and one that does define DEEP_SLEEP as a sensor? Then use that as a condition for registering the callback function and registering. It seems at this time there can only be one callback function. Maybe you've thought of this?

Regarding the fdrs_node.h code I may take that reorganization as a task at a later point.

timmbogner commented 4 months ago

It seems at this time there can only be one callback function. Maybe you've thought of this?

I've been thinking about it more since you mentioned it and you're right, it could be static. I think a problem with what you say is that each ESP-NOW device that registers as a controller takes up one slot in the peer list of the gateway, which is limited to 16. It would be best to keep these to the devices that actually need to receive data. Shouldn't there be a way for a node to request the time? I guess this is sort of what you asked about early on. I think the way it should work is: If a node connects as a controller, it will get time updates from the gateway. Any sensor can send a request for the time to its gateway and get the time that way, but it won't be fed the time automatically.

In general, I think a sensor should not be listening or receiving anything unless the user explicitly runs addFDRS() (or pingFDRS() etc). Just a thought I came up with now: For a user to request the time from a gateway what about queryFDRS(enum) and we can start an enum with time and other things the node may want to query a gateway about (I'll think of other things later :D) It could end up as a handy diagnostic tool.

As a side note: The controller callback is going to come under more scrutiny soon, as I trudge through this CBOR stuff. If the incoming data can be one of several types, will there have to a callback for each? Or will there a function within a single callback to decode it? Will there be more variables passed to the callback? It doesn't help that despite using them a few times, I still need to review every time I work with function pointers.

#

I've tested ESP-NOW Gateways 00 -> 01 ->02 plus a controller, and they all are working perfectly and keeping time. LoRa next.

aviateur17 commented 4 months ago

It would be best to keep these to the devices that actually need to receive data.

Sounds good. makes perfect sense.

. Any sensor can send a request for the time to its gateway and get the time that way, but it won't be fed the time automatically.

I'll look into this. Shouldn't be hard to do this. I think the code is pretty much there. Just need a good function call.

As a side note: The controller callback is going to come under more scrutiny soon, as I trudge through this CBOR stuff. If the incoming data can be one of several types, will there have to a callback for each? Or will there a function within a single callback to decode it? Will there be more variables passed to the callback? It doesn't help that despite using them a few times, I still need to review every time I work with function pointers.

Sounds good. I'll leave that stuff alone.

I've tested ESP-NOW Gateways 00 -> 01 ->02 plus a controller, and they all are working perfectly and keeping time. LoRa next.

I have to do some more testing in edge cases and different settings so I'll post more updates as I find issues.

Sounds like we're going to make the assumption that the MQTT gateway is the only gateway that would be the initial source of time, right?

timmbogner commented 4 months ago

Sounds like we're going to make the assumption that the MQTT gateway is the only gateway that would be the initial source of time, right?

No, my intent is that any gateway could be a source of the time. Hopefully that's attainable here. Let's take a case where there is the same gateway scenario as the examples.

I want to make sure this feature is still available if MQTT isn't used on the network. For example if the front-end is just a logger using serial and no WiFi. I'm sure I'm overlooking some things still...

aviateur17 commented 4 months ago

So, using your example, let's say 00 does not have Internet access but does have RTC. Somehow, the time on the RTC is off and not correct. 02 has GPS. GPS should be preferred over RTC. Do we implement a check if the two time sources differ use the one that should be more accurate? 01 would be able to tell if the time they are receiving from 00 and 02 are different by some appreciable amount.

aviateur17 commented 4 months ago

So, using your example, let's say 00 does not have Internet access but does have RTC. Somehow, the time on the RTC is off and not correct. 02 has GPS. GPS should be preferred over RTC. Do we implement a check if the two time sources differ use the one that should be more accurate? 01 would be able to tell if the time they are receiving from 00 and 02 are different by some appreciable amount.

I'm thinking that we make the assumption that if a gateway has an RTC that it should be updated at fairly regular intervals by some Internet connection or GPS otherwise why put the RTC on that gateway. So I'm going to then assume that the RTC and other gateway with GPS are assumed close enough in time to each other. If there is an issue where the time is off (> 10 seconds?) then raise a DBG message and hope that someone sees the message. Thoughts on that?

timmbogner commented 4 months ago

Yeah I'd been thinking about RTC too. I think I may have come up with something just now. What if the RTC gateway never shares its time until a time is sent to it? Basically we assume the RTC time is incorrect until an outside time packet sets it. If the user wants to pre-set their RTC and use it as the only time source, we can set up a config to override this behavior.

So... what am I forgetting?

timmbogner commented 4 months ago

Is it possible to paste a timestamp into the serial console and set the time manually?

timmbogner commented 4 months ago

I'm thinking that we make the assumption that if a gateway has an RTC that it should be updated at fairly regular intervals by some Internet connection or GPS otherwise why put the RTC on that gateway. So I'm going to then assume that the RTC and other gateway with GPS are assumed close enough in time to each other. If there is an issue where the time is off (> 10 seconds?) then raise a DBG message and hope that someone sees the message. Thoughts on that?

I didn't see this til just now. I think we're thinking pretty closely. I had also thought about a routine that keeps track of whether incoming times are synced. I like the idea of a debug message, and we could expand from there if wanted.

aviateur17 commented 4 months ago

Periodically generate a FDRS message indicating a time issue so it can be sent to the front end. Like an alarm condition.

timmbogner commented 4 months ago

Periodically generate a FDRS message indicating a time issue so it can be sent to the front end. Like an alarm condition.

Excellent idea! We can use INTERNAL_ACT to send it.

timmbogner commented 4 months ago

Ayyo, now it's working! Had to switch to using radio.transmit() instead of radio.startTransmit().

My setup is 00_MQTT->01_Serial/ESP-NOW->02_ESP-NOW/LoRa->03_LoRa

I'll be playing with it alot more in the next few days, plus I've gotta bust out the ol' T-Watch. Very excited!

aviateur17 commented 4 months ago

Ayyo, now it's working! Had to switch to using radio.transmit() instead of radio.startTransmit().

My code is working with:

int state = radio.startTransmit(pkt, sizeof(pkt));

Wonder what's going on.

timmbogner commented 4 months ago

My code is working with:

That's interesting, does it still work for you with my change?

Wonder what's going on.

It must be a timing thing. This low-level LoRa is tricky. What hardware are you on?

When sending a system packet with transmitLoRa(), startTransmit() is called it and begins the transmission, we set transmitFlag, and it moves on in code. Within handleLoRa(), we wait until the interrupt is triggered, then if transmitFlag is set it runs finishTransmit(). Next, it checks if there is an async LoRa release happening. This is a potential conflict, but we'll overlook it for now. There usually shouldn't be, so it goes to this:

DBG("LoRa airtime: " + String(millis() - tx_start_time) + "ms");
radio.startReceive(); // return to listen mode
enableInterrupt = true;
transmitFlag = false;

The airtime display assumes that it is at the end of an async release, so it never has accurate data. # My guess as to the issue is that the interrupt isn't triggering where we expect it to, and it never has a chance to finishTransmit(). When switched to transmit(), all of the interrupt stuff happens before it moves on, so it doesn't have a chance to get confused.

I'm going to fill the handleLoRa() function with DBGs and try to get to the bottom of it. I'm hoping transmit() will be okay for now. I've had a post-it on my monitor that says "investigate blocking transmit" since the last timekeeping effort. I think this is why I couldn't get it to work last time too, but I never quite made the connection.

timmbogner commented 4 months ago

I added some debug messages and the transmissions began working. I believe I found the problem.

In sendTimeLoRa() you run transmitLoRa() twice in a row with no delay. If you are using radio.startTransmit(), you need to wait for the first operation to finish before doing it again. Probably just as easy to use radio.transmit().

You can probably recreate the issue by disabling FDRS_DEBUG.

timmbogner commented 4 months ago

I assembled some console recordings to demonstrate. *INT* is printed when the interrupt itself is called. The rest of the new debug messages are either commented out in the existing handleLoRa() code or self-explanatory.

First, this is sending the time with radio.startTransmit(). Note how the interrupt is triggered right after it (tried to) send the second message. That was the chip indicating it was finished with the first transmit:

12:12:16.838 -> Sending out time
12:12:16.838 -> Sending time via LoRa
12:12:16.838 -> Sending time to LoRa broadcast
12:12:16.838 -> Transmitting LoRa SYSTEM message of size 11 bytes with CRC 0x9649 to LoRa MAC 0xffff
12:12:16.877 -> Sending time to LoRa Neighbor 2
12:12:16.877 -> Transmitting LoRa SYSTEM message of size 11 bytes with CRC 0xc248 to LoRa MAC 0xee03
12:12:16.877 -> *INT*
12:12:16.877 -> Sending time to ESP-NOW Peer 1
12:12:16.877 -> Sending time to ESP-NOW Peer 2
12:12:16.877 -> Sending time to ESP-NOW registered peers
12:12:16.877 -> Interrupt triggered
12:12:16.877 -> TxFlag: 1
12:12:16.877 -> TxStatus: 3
12:12:16.877 -> Finishing Transmit
12:12:16.877 -> *INT*
12:12:16.877 -> LoRa airtime: 60038ms

Next is the way it works with radio.transmit(). I was curious how this would play out. It doesn't have a chance to get to handleLoRa() while sending the two packets, so the interrupt flag is only handled once, after everything else is finished. It luckily handles it by running radio.finishTransmit(), which is not needed here, but I don't think it hurts anything:

12:07:28.567 -> Sending out time
12:07:28.567 -> Sending time via LoRa
12:07:28.567 -> Sending time to LoRa broadcast
12:07:28.567 -> Transmitting LoRa SYSTEM message of size 11 bytes with CRC 0xadc9 to LoRa MAC 0xffff
12:07:28.633 -> *INT*
12:07:28.633 -> Sending time to LoRa Neighbor 2
12:07:28.633 -> Transmitting LoRa SYSTEM message of size 11 bytes with CRC 0xf9c8 to LoRa MAC 0xee03
12:07:28.666 -> *INT*
12:07:28.666 -> Sending time to ESP-NOW Peer 1
12:07:28.666 -> Sending time to ESP-NOW Peer 2
12:07:28.666 -> Sending time to ESP-NOW registered peers
12:07:28.666 -> Interrupt triggered
12:07:28.666 -> TxFlag: 1
12:07:28.666 -> TxStatus: 3
12:07:28.666 -> Finishing Transmit
12:07:28.666 -> LoRa airtime: 240116ms

Here is a look at a DataReading being sent under this paradigm:


12:13:58.810 -> Incoming ESP-NOW.
12:13:58.810 -> Sending to ESP-NOW Neighbor #2
12:13:58.810 -> Sending to ESP-NOW peers.
12:13:58.851 -> Sending to LoRa neighbor buffer
12:13:58.851 -> Sending to LoRa broadcast buffer
12:14:01.917 -> Transmitting LoRa message of size 13 bytes with CRC 0x858f to LoRa MAC 0xee03
12:14:01.987 -> *INT*
12:14:01.987 -> Interrupt triggered
12:14:01.987 -> TxFlag: 1
12:14:01.987 -> TxStatus: 2
12:14:01.987 -> Finishing Transmit
12:14:01.987 -> Transmitting LoRa message of size 13 bytes with CRC 0xea8f to LoRa MAC 0xffff
12:14:02.032 -> *INT*
12:14:02.032 -> Interrupt triggered
12:14:02.032 -> TxFlag: 1
12:14:02.032 -> TxStatus: 3
12:14:02.032 -> Finishing Transmit
12:14:02.032 -> LoRa airtime: 92ms
aviateur17 commented 4 months ago

Sounds good. I probably won't have a chance to look at the information or analyze it for a few days or a week. I'm trying to finish up the timekeeping for the node ESP-NOW and LoRa. I also need to look at your comments on JSON console output as I still don't understand those either.
In my testing with this code I've ran into a couple of issues that I keep in the back of my mind:

  1. DBG or serial output takes quite a long time to do, relatively speaking, and the placement of the console output isn't necessarily in line with what is happening in the code. The console output could appear after several other lines of code have executed.
  2. Order of the network calls (ESP-NOW and LoRa Xmit and Receive) matters relative to the code around it. There was some code for ESP-NOW that transmitted and then did some stuff like DBG and then waited for a reply. Well the reply came before the DBG happened so by the time the code went to wait for a reply the reply had already been received and so the code waiting for a reply timed out. Moving the ESP-NOW Transmit right before the spinning loop waiting for the reply resolved the issue. So the order of the code matters quite a bit and console output takes a considerable amount of time.
aviateur17 commented 4 months ago

172f61c

All: Sensors can request gateway to send time All: Time source tracking per gateway/controller/sensor and prioritizing sources All: Time difference macro to simplify comparing the last time some event happened:

define TDIFF(prevMs,durationMs) (millis() - prevMs > durationMs)

Gateway: Lora asynchronous ping support so CPU time is not wasted while waiting for a ping response.

Gateway: Add support for GPS module (AT6558 based module) If others have GPS modules that receive different

Gateway: Clean up loopFDRS() so that the whole loop is consistent: move action handling into separate function and rename updateTime to handleTime.

Controller/Sensor: ESP-NOW ping is now asynchronous. Probably not required since pings are on the order of 50ms or less.

Smaller enhancement of DBG statements

aviateur17 commented 4 months ago

Hey Timm, I feel like I'm getting close to completing this functionality. I'm finishing up my added features testing and then I need to confirm successful compilation and go through all of the DBGs.
On the DBGs, what is your thought on which DBG statements you want at level 0? Should I leave that up to you, should I take a stab at guessing, or can you give some examples of what you want at level 0? Regarding the compilation, there are so many combinations of #defines now that it's tough to be able to test every possible one. I'll try to test the most common ones for sure.

timmbogner commented 4 months ago

This is all fantastic! Sorry I've been a little unresponsive. I haven't been able to test the newest commit because of some issues with my WiFi. I think the router I bought used on Craigslist in 2016 might have finally failed. I presently realized that's a good excuse to test ethernet, which I'll work on today and tomorrow.

For DBGs: I think I'd say you should set up LoRa and any other messages how you like them, and I'll look at that as a rough guide for the rest. Let me know any feedback once I do. I had set some of them up for ESP-NOW in the pull request, (now in main) that aren't here in this one. Assuming they'll show up when you resolve the conflicts.

I needed to define GPS_TXD and GPS_RXD for it to compile in Arduino. I'll begin testing various things today after I get ethernet running. I noticed after I commented on the new reboot code for ESP32 that it would probably need protection from being compiled in 8266 too. Generally, though, this is mind-blowingly cool.

aviateur17 commented 4 months ago

This is all fantastic! Sorry I've been a little unresponsive. I haven't been able to test the newest commit because of some issues with my WiFi. I think the router I bought used on Craigslist in 2016 might have finally failed. I presently realized that's a good excuse to test ethernet, which I'll work on today and tomorrow.

No worries on any delays. Totally understand that everyone is busy with many things.

For DBGs: I think I'd say you should set up LoRa and any other messages how you like them, and I'll look at that as a rough guide for the rest. Let me know any feedback once I do. I had set some of them up for ESP-NOW in the pull request, (now in main) that aren't here in this one. Assuming they'll show up when you resolve the conflicts.

I will look at the DBGs for ESP-NOW and try to replicate for LoRa. My thought, in general, is that DBGs are also meant to be compatible with OLED as far as length (you said this quite awhile ago) and should be very basic in nature and not too many of them included.

I needed to define GPS_TXD and GPS_RXD for it to compile in Arduino. I'll begin testing various things today after I get ethernet running. I noticed after I commented on the new reboot code for ESP32 that it would probably need protection from being compiled in 8266 too. Generally, though, this is mind-blowingly cool.

I'll check on the #defines for those. I thought I had them surrounded by USE_GPS but I could be wrong. I'll go back and check.

timmbogner commented 4 months ago

Ethernet seems to work out-of-the-box!

I haven't fully digested all of the code and the various rules and expected behaviors yet. As such, I'm still not sure what the special rule about sending time to gateways addressed 0x00 does. Does the case of my MQTT/NTP gateway being 0x01 throw that off? I think that rule is the only thing I'm uneasy about so far. The order of gateways can't be assumed by the system, and it should be able to work the same way as if the addresses were reversed (like if the highest address was closest to the front-end) for example. Once I get my head around things I'll try to make some constructive suggestions for alternatives.

aviateur17 commented 4 months ago

As such, I'm still not sure what the special rule about sending time to gateways addressed 0x00 does. Does the case of my MQTT/NTP gateway being 0x01 throw that off?

In the earlier timekeeping PR we said that if a peer was left as 0x00 (aka not defined) that we would not want to send the time to that address. Hence I think we were assuming that 0x00 would never be assigned as an address to a legitimate gateway. Therefore if the peer address is 0x00 I do not send the time to that address.

if((timeMaster.tmAddress != (ESPNOW1[4] << 8 | ESPNOW1[5])) && ESPNOW1[5] != 0x00) {
    DBG("Sending time to ESP-NOW Peer 1");
    result1 = sendESPNow(ESPNOW1, &sys_packet);
  }
if((timeMaster.tmAddress != (ESPNOW2[4] << 8 | ESPNOW2[5])) && ESPNOW2[5] != 0x00) {
    DBG("Sending time to ESP-NOW Peer 2");
    result2 = sendESPNow(ESPNOW2, &sys_packet);
  }

So if the ESP-NOW address of any of the peers ends in 0x00 we do not send the time to that peer.

timmbogner commented 4 months ago

Ooooh, I see now. I was thinking there was an assumption that 0x00 was the MQTT gateway, sorry. I don't have ~a better~ another solution right now, except to send to the usually non-existent gateway.

This is only a faint idea, but what if the user could set ESPNOW_NEIGHBOR_1 to "-1"? Then with a preprocessor condition we check if that value is negative, and we can set a flag in code or something so we know that gateway isn't real. That might be too obscure, but let me know what you think.

aviateur17 commented 4 months ago

I'll just remove the check for 0x00 and we can send to a non-existent peer. We can always change later if need be.

timmbogner commented 4 months ago

I'll just remove the check for 0x00 and we can send to a non-existent peer. We can always change later if need be.

Sorry, I just realized I worded that wrong. I should have said sending to non-existent peer was another option, not necessarily a better one. I don't think you need to change it as long as we are using 0x00 as a placeholder for a null (non-existent) gateway.

aviateur17 commented 4 months ago

I added some debug messages and the transmissions began working. I believe I found the problem.

In sendTimeLoRa() you run transmitLoRa() twice in a row with no delay. If you are using radio.startTransmit(), you need to wait for the first operation to finish before doing it again. Probably just as easy to use radio.transmit().

You can probably recreate the issue by disabling FDRS_DEBUG.

I tried radio.transmit last night and LoRa did not work. I seem to recall from the switch over to RadioLib that my Hallard shields did not map one of the interrupts on like DIO2 to the chip. I haven't looked into the details again recently and I don't know that's the issue but it could require more than one Interrupt. I'll look in more detail later.

timmbogner commented 4 months ago

I'll have to review the pins and interrupts and stuff when I'm back at my computer on Friday. To check into what is going on, I would insert Serial.println("INT"); into the interrupt handler function, then look to see if and when the interrupt is being triggered. You should see it triggering after startTransmit() is finished, but if using transmit() you should see it triggered before the leaving the function.

I'm sure we both know it's bad practice to do UART stuff in an ISR, but it works for this troubleshooting exercise. Probably don't try a DBG tho.

aviateur17 commented 4 months ago

Timm, thanks for the tip. I will definitely try that. Spent many hours last night trying to resolve a LoRa comm issue. So frustrating in the number of times I've changed code and downloaded. More troubleshooting to be done today. To keep my sanity I may install the original code just to show myself it's not the hardware.

timmbogner commented 4 months ago

I'm doing some research on my phone now. Yeah, I guess I forgot that the Hallard shield has NO interrupts mapped to GPIOs on the ESP. If that's the case then you might be just getting lucky in terms of timing thus far? If you are having sudden mysterious transmission/reception issues, also look at whether you have changed or removed DBGs. In the past i've found the timing held together only because of the delay caused by debugs. I'm not sure how it all can work for you without the interrupt, especially with my code that relies on it. I will review some more, but I was thinking that receiving was initiated by the interrupt being triggered.

Maybe try hardwiring an interrupt between the RFM95 and ESP to see if it helps?

aviateur17 commented 4 months ago

I'm doing some research on my phone now. Yeah, I guess I forgot that the Hallard shield has NO interrupts mapped to GPIOs on the ESP. If that's the case then you might be just getting lucky in terms of timing thus far? If you are having sudden mysterious transmission/reception issues, also look at whether you have changed or removed DBGs. In the past i've found the timing held together only because of the delay caused by debugs. I'm not sure how it all can work for you without the interrupt, especially with my code that relies on it. I will review some more, but I was thinking that receiving was initiated by the interrupt being triggered.

Maybe try hardwiring an interrupt between the RFM95 and ESP to see if it helps?

DIO0, DIO1, and DIO2 on the LoRa Chip are connected to diodes and then to one GPIO PIn on the Hallard shield. Depending upon how the chip is configured each DIO pin should generate an interrupt if something is happening. Now how this works in the RadioLib software I'm not quite sure. I'm not sure what the default configuration is for those DIO pins using the RadioLib software is.

timmbogner commented 4 months ago

Oh that's right. Okay so maybe the solution for you is to remove the other diodes? In our usage on sx1278, Radiolib uses only the one interrupt (dio0). If we implement channel activity detection, that is where a second interrupt comes into play (dio1). If your interrupt handler is being falsely triggered by dio1 instead of dio0, that could be an issue for you. Just a guess ...not sure if there are any signals happening on dio1 by default or not.

I don't have a clue what dio2 is doing, I'm not sure if it's even used.

timmbogner commented 4 months ago

I'm somewhat keeping up with reading these, just haven't tested it in a bit. 08724ab is a PR of its own, but I see why you combined the files. Everything looks cool so far. I'm presently reading through your new DataReading release code. Could you transcribe how the txWindow stuff is working exactly? It doesn't look too complicated, I just want to have a good grasp on the concept.

timmbogner commented 4 months ago

I have a small fix for the ping functionality in 'main' right now. Basically it prints the ping time before it records it, adding extra time to the returned time. This causes discrepancies in the displayed vs. returned time.

I know you've made a lot of ping changes here, is that even still an issue?

EDIT: Looking over the code a little more, it looks like pingFDRS() doesn't return anything now. Is there still a way for the user to get/use the ping time?

aviateur17 commented 4 months ago

I'm somewhat keeping up with reading these, just haven't tested it in a bit. 08724ab is a PR of its own, but I see why you combined the files.

Yeah, I was thinking the same thing. It took quite a bit of time to do that. I was having issues with LoRa transmissions and not receiving them on some gateways but was receiving them on others and after some banging my head and thinking I was going nuts I decided that I wanted to change to more of a Queue/Buffer instead of just transmitting right away. I figured this might help with my issues - and I think it did. If you would rather me do a separate PR on that I'd be glad to. I'd probably start working on that before finishing this one though as, for me, it really seems to help my LoRa communication.

Everything looks cool so far. I'm presently reading through your new DataReading release code. Could you transcribe how the txWindow stuff is working exactly? It doesn't look too complicated, I just want to have a good grasp on the concept.

The TxWindow stuff is really for retries or continuous transmissions so that there can be a little space in between transmissions for listening for responses. With the buffer and the asynchronous nature, now there is no while loop to switch between transmit and receive and then wait awhile and back to transmit. So this is a bit of a timer to say "wait for a little bit before trying to transmit again" otherwise for ACKs it would just transmit one after the other without taking time to listen for the ACK to come back. The value could be adjusted to 0 to remove the functionality for testing. I suppose the other option would be just specifically for ACKS to wait awhile for the re-transmissions but I also had an issue where if I perform a ping followed by a time request followed by some loads and sendFDRS then I wouldn't get a response from the ping because I'd be too busy transmitting all of that other stuff. Now there some space to listen for a ping response before going on to the time request and then to the sendFDRS call.

aviateur17 commented 4 months ago

I have a small fix for the ping functionality in 'main' right now. Basically it prints the ping time before it records it, adding extra time to the returned time. This causes discrepancies in the displayed vs. returned time.

I know you've made a lot of ping changes here, is that even still an issue?

EDIT: Looking over the code a little more, it looks like pingFDRS() doesn't return anything now. Is there still a way for the user to get/use the ping time?

Yeah, I implemented an asynchronous ping so it will just output the result to the screen. What we could do, and Håkan's post gave me this idea, would be to implement a callback function when we get either a ping or a ping timeout and then we can send the ping time or lack of time to the callback function. That way something can be done in response. What do you think about that?

aviateur17 commented 4 months ago

Is it possible to paste a timestamp into the serial console and set the time manually?

I've been thinking about this and it should be possible. I'm just wondering how to prevent the input from being scrambled by all of the output that is happening. Would be nice to pause the output for bit while waiting for input so the stuff on the screen doesn't get clobbered. Any ideas on that?

aviateur17 commented 4 months ago

I think the functionality should be complete. Let me know if you run into any issues. Looks like there are merge conflicts so let me know if you want me to submit another PR or if you're going to manually merge the conflicts. No rush on this as I know it's a lot of changes and everyone is busy.

timmbogner commented 3 months ago

Hey! Thanks so much for all the work you've done here! I have a commit in the works that changes some DBG levels etc, and also adds a USE_GPS macro to protect GPS_IF from being defined.

EDIT: I hacked my way through the following issue. The problem is that when setTime() is run initially timeSource.tmSource == TMS_NONE; , causing validTime() to be false and setTime() to fail. I patched it by setting timeSource.tmSource = TMS_NTP; inside of fetchNtpTime(). You'll probably have a better overall solution.

# The hang-up currently is that 00_MQTT doesn't seem to automatically share the time after receiving it via NTP. validTime() seems to always be returning false. I made the following changes to the code, and I cannot get a "YES". I'm fetching the time at 1 minute, and sending at 2.


bool validTime() {
  DBG("VALIDTIME?");
  if(!VALID_TS(now) || timeSource.tmSource == TMS_NONE) {
    if(validTimeFlag) {
      DBG1("Time no longer reliable.");
      validTimeFlag = false;
    }
    DBG("NO");
    return false;
  }
  else {
    if(!validTimeFlag) {

      validTimeFlag = true;
    }
    DBG("YES");
    return true;
  }
}
aviateur17 commented 3 months ago

Thanks for catching that. Here's what I'm thinking..

in fdrs_time.h line 130

bool validTime() {
  if(!VALID_TS(now) || timeSource.tmSource == TMS_NONE) {
    if(validTimeFlag) {
      DBG1("Time no longer reliable.");
      validTimeFlag = false;
    }
    return false;
.....

in the if() statement removing the check for " timeSource.tmSource == TMS_NONE" What I was thinking about is that if the time hasn't been updated for a "long time" by some time source that the local time may not be valid or accurate. My thought is to leave that check out and we can figure something out later if it becomes an issue.

timmbogner commented 3 months ago

I was having issues with LoRa transmissions and not receiving them on some gateways but was receiving them on others and after some banging my head and thinking I was going nuts I decided that I wanted to change to more of a Queue/Buffer instead of just transmitting right away.

Circling back to this. I'm testing LoRa now, and I think the time is working. However there are a few DataReading features that got lost in this change (as far as I can tell). With the old functionality I was able to give the gateway more than a packet's worth (30-40+) of DRs and the asyncRelease would split them into several packets and send them to each destination sequentially. It seems your new functionality only supports a single LoRa packet to each recipient, which is not sufficient.

Another behavior I noticed, but didn't look into closely yet was the way that DataReadings are sent via LoRa instantaneously upon arrival. The previous behavior was to save, combine and release them at a set interval. This was to limit radio chatter mostly, and to prepare to add channel activity detection (which would delay the release until the channel was clear). It also leaves it upon the user to set the delay in accordance with their local regulations regarding airtime and duty-cycle. I will have to look closer at your code. I guess I don't fully understand how your new buffering system works yet. Let me know if I'm misunderstanding things so far. I've always considered that short SystemPackets, when sent individually, are an acceptable exception to that set transmission cycle.

state_t was in use by RadioLib so I changed it to pingstate_t.

timmbogner commented 3 months ago

I guess this brings up a general question: Is it better to send a large amount of shorter packets, or a small amount of larger packets? Might be something to consult the Meshtastic community about. I'm definitely all ears if you have thoughts.