Add stateful processing to clean up device semantics

gdt commented 12 months ago

It has become clear that some stateful processing is needed. rtl_433 itself by doctrine emits one message for each received transmission. Straightforward processing means that for say 3 repeats of a temperature sensor (done by the device for reliability), one gets 3 mqtt messages or 3 influxdb entries. This is particularly important for events. What's needed is an approach to coalesce these, which could be:

something that receives syslog/json and emits a reduced stream, to use as a pipe component between rtl_433 and rtl_433_mqtt_relay
some other way to plumb that
built into rtl_433_mqtt_relay, but this is harder to reuse, so perhaps a python module that can be used in various programs

Besides simply filtering duplicates, one could also consider transforming messy semantics (Govee) into binary sensors, implementing logic like "a wet report changes state to wet, and both 17 minutes without a wet report and having heard a message from the sensor, changes state to dry".

gdt commented 12 months ago

See #908 for filtering in rtl_433, but this seems counter to doctrine.

merbanan commented 12 months ago

I think state-full and duplicates are different things. The dup-check should filter things that belong to signal redundancy. The state-full logic should handle cases when several signals belong to the same message.

merbanan commented 12 months ago

https://github.com/merbanan/rtl_433/pull/2314 contains a suggestion from me on an implementation for state-full support in the decoders.

rct commented 12 months ago

Here's another use case (or two) that I think is worth considering:

With a reasonably large number of sensors, particularly those that have a volatile component in their ID, it is useful to have easy access to some meta data about devices, examples include:

When a device was first heard
When a device was last heard
When battery state last changed

These would be more useful if they were retained across rtl_433 invocations to be more useful.

Why:

an existing known device might have intentionally or unintentionally been power cycled and now have a new ID that makes it essentially a new device.
If multiple devices of the same type are in use, it can be difficult to figure out which device is which.

Note: there are some statistics tracked by rtl_433, but in most cases they aren't that useful for managing an environment with a bunch of devices. The stats that are available are geared more towards decoder development. Statistics/meta data that would be useful for managing the environment often require tracking state.

Architecturally, I'd like to see rtl_433 do the best job it can managing the radio and presenting the bits in ways other applications can easily consume. The flex decoders are a great example that seem to make this case. There are some devices where the output of the flex decoder is sufficient. For others, something needs parse the output and turn it into what ever the use case calls for.

So in line with what @gdt was mentioning at the start of this thread, a way to address these needs, is to have some sort of framework that handles all of the boilerplate scaffolding for receiving messages, using timers, and then being able to send the generated data/messages to the desired destinations. It should be easy to implement a piece of code to get the desired applicaiton logic without having to re-implement all of the scaffolding.

A number of the example apps (mqtt relay, hass autodiscovery, ...) could benefit from being implemented in a common framework with common scaffolding, that has the full event loop type semantics.

I think this approach could handle many cases. I've skimmed #2314 but I don't understand whether what's needed there could be implemented in a consumer or really needs to be in rtl_433 for some reason.

gdt commented 12 months ago

I agree that duplicate suppression could be separated from this. I didn't write it this way because I have perceived that there is doctrine against putting this in base rtl_433. My own view is that duplicates are almost entirely unhelpful (unless one is assessing signal delivery, in which case a "don't do dup suppression" flag is in order). I'm happy to add an issue for dups in rtl_433 proper, post decode and pre emission of all types, and let this be about more complicated processing.

gdt commented 12 months ago

For rct@'s example, I see a "keep track of history" as a database/consumer sort of parallel to HA rather than a core rtl_433 feature. I'm really thinking about transforming "message received" into sensible semantics for some other system to process, and I really started down this mental path (more than just dups) because of Govee.

gdt commented 12 months ago

2314 contains a suggestion from me on an implementation for state-full support in the decoders.

That seems to be about what I might call MAC-level fragmentation, where two data bursts are one logical packet and it's not possible to even do CRC checking on them separately. It sounds like "just emit json with the bits from each and let some other program deal" isn't going to work. I tend to think we should limit stateful processing to those sorts of cases (as irregular and otherwise intractable), and use a higher-level approach for things that are more normal, like "if it isn't reporting wet, and we heard from it, it is now dry".

merbanan commented 12 months ago

What I aim for is to design robust solutions. In the case where a device uses random ids there is no way to make it unique in a stable way by only using the id. In theory you could separate the devices by figuring out the cycle time and RSSI. But I dont think we can assume it will be linear over time. So the only way is to look at some other parameter and that is the channel if it is possible to change it. That way you often have the ability to use 3 sensors of that model.

So in this case the sensor has its limits and I dont think we should bend over backwards to give the illusion that it is any better.

What @gdt write is what I feel could be in the scope of rtl_433. MAC-level fragmentation and duplicate filtering. More advanced thing that would require keeping state longer and using timers does not really fit with the current error handling logic (just exit and restarting).

merbanan commented 11 months ago

But in the case of emitting implied state from a sensor that would be ok if it makes it easier to use. But I have devices which are hard to use. It is a contact sensor that only triggers when the contact is open but not when it is closed. Using that sensor in any kind of system needs something to keep state.

gdt commented 11 months ago

I completely understand about trying to guess ids being bad. I have split off the two specific things we agree can go in core rtl_433 into #2640 and #2641. This issue is now about adding some sort of middleware between json-format deocdes and sending to HA etc. via mqtt or some similar mechanisms. A candidate architecture is a python library that accepts json objects and some kind of timer tick, and produces json objects, that one can interpose between rtl_433 and sending to data consumers.

gdt commented 11 months ago

Another use case, discussed in #2540, is to map ids (from badly-behaved devices that change their id on powerup) into fixed tokens, so that later processing (db, HA, whatever) can operate on logical devices. Per doctrine, this doesn't belong in rtl_433 core.

sheilbronn commented 11 months ago

In the recent years I built an extensive (somewhat ugly) bash wrapper daemon around rtl_433 to cover some of the use cases mentioned here. You might want to have a look at its feature list for inspirations: rtl2mqtt. What might be interesting here:

HA MQTT discovery (not immediately after the first message to avoid errors)
dewpoint approximation
configurable rounding of values (less flicker)
One JSON MQTT message per reception - suppressing immediate duplicates (if no value changed)
if given: prefer channel over ID for the identifier
... other stuff

gdt commented 11 months ago

Another use case, discussed in #2105, is to filter bad data by some recognition of values that are not physically possible (temp changes at super fast timescales).

merbanan / rtl_433

Add stateful processing to clean up device semantics #2635

2314 contains a suggestion from me on an implementation for state-full support in the decoders.