hdtodd / rtl_watch

Actively monitor rtl_433 for devices in your neighborhood
MIT License
6 stars 1 forks source link

Feature Request: Total received message count and processed record count. #4

Closed rct closed 1 year ago

rct commented 1 year ago

It would be helpful (to me) for rtl_watch to display a total running count of the number of messages it's received, the number of de-duped messages it has processed. This could be information in the bar that shows earliest and latest record.

I'm observing rtl_watch and a tail of my log file. I'm looking at the processing of the Acurite Atlas weather station that should send a message every 10 seconds. Each message is sent 3 times, so with perfect reception and decoding I should see 18 messages a minute and 6 de-duped records.

I believe I'm seeing cases where the record count in rtl_watch isn't increasing even though I do see messages being logged.

This might not be a problem with rtl_watch, the path from rtl_433 to rtl_watch is through the MQTT broker (mosquitto in my case) where the log file is direct to the file system via a pipe from rtl_433.

This also might not be that easy to track down.

Related to https://github.com/hdtodd/rtl_snr/issues/3, ift would be helpful if the per device table showed records/messages.

Actually something easy to do which I should do try myself, the STDOUT logging showed the current record/message count.

hdtodd commented 1 year ago

Yes, it will be easy to add the # of pkts and # of transmissions to that information bar. I've added it to my task list.

I've modified the ITGT code to report counts of both packets and transmissions (de-duped) both for the complete log file and for each individual device. I'll post that later today, after a little cleaning up, and also incorporate that code into SNR.py.

From that code, I see that the rtl_433 log file shows some variability in the number of packets per transmission -- more than I had anticipated. And it also shows spurious packets: a single packet 4 seconds after a prior transmission, with a subsequent transmission at the expected time. I posted a note about that in the discussion. I understand that interference could result in less than the expected number of packets per transmission (e.g., 5 rather than 6), but I can't explain why a single packet is seen after 4 seconds, between two regular transmissions, when the transmission period is consistently 30 sec. That's why I concluded that the inter-transmission gap time may not be all that helpful as an indicator of sensor reliability. There IS interesting information in the ITGT analysis that could be useful, so I want to make that available. But for long JSON log files, with many devices popping in and out of range, that report may not be as helpful as I thought.

Re record counts in rtl_watch, I've watched an mqtt-subscriber on one system with rtl_watch running on another to be sure that the count of transmissions is incremented in rtl_watch whenever a new transmission has been seen by the subscriber. I haven't looked at it over an extended period of time, though, so it's possible I just didn't see it ignore a transmission. I'll go back and monitor more closely.

In SNR, I've fixed exception handling (as best I know how). Your comment was very helpful. I went back to see if I could figure out why processing that 2-month log file failed ... I didn't think it could really be related to file size, and I thought I had split the file into thirds and processed each OK. But I hadn't. Going back an looking again, I found that the first line not processed was preceded in the file by a block of null chars. I'd seen that in JSON log files earlier, caused, I think, by disruptions of rtl_433 in process. So I added a little more information to the error handling and added guidance on how to remove any nulls that might be in the data file.

I'm getting ready to work on the eTime calculation, taking your suggestion. But I need to experiment a bit to see if there are ways of handling it flexibly in Python. Not sure there is, but want to explore.

rct commented 1 year ago

I'm getting ready to work on the eTime calculation, taking your suggestion. But I need to experiment a bit to see if there are ways of handling it flexibly in Python. Not sure there is, but want to explore.

I've now had to edit that in 3 places in your code so far. I haven't tried DNT yet.

There are lots of ways of parsing time stamps. Saw this useful article yesterday with benchmarks: https://www.geeksforgeeks.org/convert-python-datetime-to-epoch/

When possible I try to use what's in the standard library to avoid needing to install more modules and deal with more dependencies.

I think you can cover 80+ of the cases for rtl_433 by just handling:

Of course If you really wanted to get the details correct there is the problem that rtl_433, like many things, doesn't use a format that includes timezone. Some people run rtl_433 in a container which doesn't have the timezone set so it logs everything in UTC. And I think there is a -M time:utc option as well.

rct commented 1 year ago

And it also shows spurious packets: a single packet 4 seconds after a prior transmission, with a subsequent transmission at the expected time.

I've seen and documented some misbehaving devices. Most of these are tiny microcontrollers and don't necessarily have the most robust firmware.

I've also seen misbehavior due to under-voltage. Sometimes that's easy to see when the power gets low enough that the microcontroller resets and starts transmitting with a new ID in the case where the ID is volatile.

However with all of that being said -- I think there are a number of things to discover in rtl_433, librtlsdr, etc. I've experienced unreliability which I'm not always convinced is RF related.

hdtodd commented 1 year ago

I thought the challenge in trying to write software to monitor remote device stability based only on the data packet you receive remotely is that there might be interference with some of the (repeated) radio packets. It turns out to be a lot more than that: devices that transmit only one packet, radio interference, low signal strength (distance or power), failed battery, or spurious packets, outside the time window you'd expect to see them.

I'll need to study more of these cases to understand how best to alert about a failing device. I think I can get the low-battery alert into the monitor without other complications, so that's the first thing I'll look for. Need to review some old log files to see if that would work.

hdtodd commented 1 year ago

I've added the total packet and total transmit counts to the information header of rtl_watch in v2.0.0.

I've also added per-device packet and transmit counts.

I hope to merge v2.0.0 into the main branch shortly, but it's available for download if anyone's interested.

I've confirmed that some packets are missed in the stats for individual devices, though the total packet count seems to have counted them. I suspect that the problem is because a single thread is both catching the MQTT broadcasts and processing the data. The problem may be exacerbated on slower CPUs. I've prototyped a threaded approach, but tkinter isn't thread-safe so I'm reluctant to incorporate threads into the production code. The missed packets don't seem to be distorting the stats over the long term (primarily a startup issue per device, I think). But I'll work on exploring the cause and possible solutions (asyncio being one) after I've finished the remaining work on v2.0.0.