Closed timpur closed 7 years ago
Good point. What do you guys @kwave @ThomDietrich @euphi think about this?
I'd prefer an explicit text instead of code numbers. $online
would not be a binary state anymore, so maybe we should change the attribute to $state
or $status
(don't know what's the most correct in english). The values could therefore be online
, offline
(clean) or lost
(unexpected disconnection).
$state
or $status
is good (I'm unsure which one is better, both are ok).
For me, offline
is too generic for a clean disconnect, so maybe disconnected
is better.
There may also be a state sleep
for deepsleep. (However, this is a little bit problematic, because the state will stay in sleep
if the device cannot reconnect for other reasons.
Also, you could send a state after MQTT connection:
reconnect
if only MQTT connection and/or WiFi was lostreset
if device was resettedpoweron
if there was a power failurewakeup
after deep sleep (On ESP8266 this seems to be possible from API using the struct resetInfo
but it seems this is somehow buggy?)Just after this, Homie shall send the online
state. This allows to see a reset/reboot reason by subscribing to $state
.
I like these ideas. Maybe have both?
State
is simply on
or off
?
Status
is more complex, Connected
, disconnected
, lost
, sleep
, etc.
??
This way we have best of both worlds, do not need to change the standard, but also adding more information about the devices current status?
In dress to @euphi
reconnect
if only MQTT connection and/or WiFi was lostreset
if device was resettedpoweron
if there was a power failurewakeup
after deep sleep (On ESP8266 this seems to be possible from API using the struct resetInfo but it seems this is somehow buggy?)
Not sure about these. These seem more like events? Maybe we could add events topic? but this is difficult because some of these events occur when the device is disconnected? Reason to split these up is, State and Status should always be set to the current state or status, so that when ever you subscribe to these topics you receive the current status or state of the device no mater when you subscribe to these topics. An Events topic would describe the the events that occur, but not necessarily the event its currently in (maybe make this not retained pub?). If this make sense.
Anyways thats my thoughts. Tell me what you think.
A simple $state
and a more detailed $status
are not possible, because the "unexpected loss" logic is based on the "last will". MQTT allows for only one last will.
The "reconnect" events are useful to see the reason of a lost connection.
The advantage of having a own state for this, is that it you can see also short connection losses (that does not trigger the last will) by subscribing to $online
.
When you think about integration with home automaton systems, it is really useful to have one property to see reboots of a connected device.
Valid point about last will. It will have to be on one topic.
I agree with your ideas. Just not sure if there should be combined? I think state and events should be kept separate. Reason for this is when an event occurs and publish the message, then do you also publish over with the current status. (publish reconnecting
then connected
?). Also we could also use this events topic for custom publishing of events, and thus do we worry about publishing over these with the current status?
That's why i think they should be separate.
Thus Status
and Events
is my conclusion but its not up to me its up to the community.
If the goal is only to differentiate between "clean" disconnects, and losses due to broken connections: I would use marvin's versions: online
, offline
, disconnected
published on the existing $online
topic.
The additional state info suggested by euphi (reconnect
,reset
,poweron
, wakeup
) are very device specific. As the Homie convention is device independent, I think these should not be a part of it. (e.g. What if my device has no Wifi and has no possibility to detect why it was reset?)
I think it is very important that if a device reconnects it is made sure that it sends on the $online
topic a value other than online
before sending online
again.
Just to say something: I'm undecided 😄
The task of the current $online
topic with it's special property to be the LWT topic is to inform an interacting device/application if the device with it's furthermore published topics is available and in an operating and "interactable" state.
If everyone agrees with this definition, the $online
topic should transport a clear "yes, interactable" or "no, not interactable" message.
@timpur @euphi could you please specify in which constellation a communication partner would need a more specific info than "Device not available for interaction"?
I'm not trying to silence your ambitions. The idea of an extended LWT-linked topic or the addition of an additional one might be meaningful but we need to find a use case for that!
I have a rule in openhab that sends me a telegram message if a device is offline. This rule is only active for the deployed devices, so I also want to know if they restart, because this is a sure sign for a problem.
Currently the rule is based on detecting a negative step in uptime, which is somehow unreliable for restart of openhab or in case of overflow and does not work for a short loss of Wifi connection.
Especially a short loss of Wifi connection was not detected at all. This was quite problematic, because it happened to my irrigation controller and it seems that filling the water tanks to 100% and leaving a ladder (metal) close to the controller just before leaving for holidays worsened the Wifi connection, so sometimes it was lost for more than one day... I now learned that signal quality of 8% seems to be the minimum for connection, so I set up an alert rule in grafana (influxdb) that notifies me if a device as connection quality of < 12%.
(BTW: I have 5 devices running permanently, two of them for more than 1 year, the others for several months, with different versions of Homie-2.0.0 and there was no unexpected reset yet. One is running for more than a year, the other had some resets due to improvement of installation (wiring etc.), in one case the circuit breaker of the 230V supply tripped. In this case the alert rule was also helpful to detect this problem that also affected the fridge, dish washer and oven...)
That's a very valid use case. If we rename $online
to $state
, we keep a offline
LWT message, and we send boot
when the device is "booting" (and that's actually a state, not even an event), ready
when the device is ready, and disconnected
when it is cleanly disconnected. It would solve this, right?
Got my approval. Just not sure about the state names.
Boot (first thing it does when connected is send boot). Finishes booting and sends ready. Then disconnected when clean and lost when not clean disconnect?
Also don't think you have to get rid of $online could have both? Up to you.
Most important now is to come to a decision to use only one topic with fine granulated state or simple states and an extra topic for detailed information.
States should be generic, but we can use Homie-ESP8266 as example what is technically possible.
Thus said, states would be:
Personally I prefer to have all this states above on one $state
topic. "connect" is not necessary then.
"disconnected" may still be useful if there is another reason for disconnect than "update", "sleep" or "poweroff".
BTW: It is intended that the fully connected states start with "on" and the fully disconnected states start with "off". This helps to have simple pattern matching rules.
all i wanted was to know when the device was 'lost' :P, but having more clarity certainly helps and i think its a good idea. That said im not sure what is the right choice in all the options.
I think both @euphi and @marvinroger have the right ideas, so trust you guys to work this out (the fine details) :P.
I wouldn't over-engineer this and take 2 seperate topics. One mandatory LWT-topic with only 3 possibilities and one implementation-specific with as many possibilities as needed. So it is very clear if the device is available or not. If you want to know more: check out the (for example) $implementation/event
topic.
As you can only have one testament, you cannot distinguish between powerloss and crash for example. So you'll lose information anyhow.
PS1 How would you send "booting"? When the device is booting I would think there's no MQTT-connection yet.
PS2 Keep in mind that MQTT allows for delivering messages ordered but it is not by default guaranteed. Although Homie will probably work with the different states that change quickly, it is in theory undefined.
@euphi I'm with @bertmelis and @marvinroger on this one. I'm not sure if we need all these disconnect substates. Also the "reconnect" state is not a state but an event. But without the intention to disregard all your states, could you name those you are interested in specifically and describe why the convention needs to unify them across clients and controllers?
@marvinroger @timpur trick question: If we define the LWT message "offline" and additionally publish a "disconnect" message in certain cases, is it okay to still receive the LWT message? I'd need to check but looking at the implementation that might become tricky. At least the python paho-mqtt package as I know it doesn't offer the option to reset or delete the LWT message while connected.
Edit: It seems to be possible, still the method could get tricky to implement and we should consider that while specifying $state
. @marvinroger I trust you have more low-level experience with that.
@bertmelis I meant "booting the Homie announce" more than an hardware boot.
@ThomDietrich discarding the LWT is indeed only possible by cleanly disconnecting, and for MQTT, this means sending a DISCONNECT
packet, waiting until TCP ACK, and then closing the connection. This is not tricky; all spec-compliant MQTT clients should support this. Even the low-power ESP8266 supports this with the async-mqtt-client client.
Minimum states that are necessary (Textual description):
The last one ist necessary to make auto-discovery reliable (e.g for openhab). Without this state, the controller cannot ensure that it has received all attributes/properties yet.
This could be especially problematic with bad network connection.
To make sure all your requirements are met, you also have to be sure the broker sends all the messages in order. With QOS1, this is only the case when the maximum number of in-flight messages is only 1, which is not by default on mosquitto.
Yes. However, the homie-implementation can wait for the messages to be acknowledged, before sending the next message. This can be achieved with a simple state machine.
The attributes and property announcement can be send at any order, so there would be only 3 phases necessary: "fresh connect" [send $state
-> "connect"], "announcement" [send all device attributes and property announcement in arbitrary order], and "connected" [send $state
-> "online"].
So, do we agree with that:
disconnected
offline
init
(boot
might indeed be confusing)ready
@euphi I think you forget one thing: what if the controller connects after the device announces itself? The ready
$state might indeed be received by the controller before the other messages. So there are two solutions:
Either require every required attributes before considering a device is discovered. Therefore, the controller can determine whether all messages are received. But this is problematic for optional attributes, we can't know if its set or not. A solution to this would be to require a $stats
attribute set to what stats are actually defined ($stats
-> cputemp,cpuload
)
As @ThomDietrich said, you can consider some safe defaults and as you receive the new attributes, you can improve your knowledge about the device
The value of the device statistics are not part of the initial announcement yet. So we need a mandatory $stats
topic anyhow to auto-discover the $stats attributes. :-)
Then, only $implementation
and the property attribute $unit
is optional.
For $implementation
its up to the implementation to make it auto-discoverable.
$unit
then should be made mandatory with an empty string as allowed value.
I was thinking exactly the same thing about $unit
. 😉
Let's do that!
I don't know if an empty string is going to work as everything is sent as retained. An empty payload means the topic will be deleted by the broker. Maybe you should use none
.
Or am I missing something?
Edit: my comment is not valid, since we're sending with QOS1 hence an empty message will be retained.
@bertmelis no, you're right... I thought empty strings were 1 byte long (null terminator), but I was wrong, the null terminator is not included in the payload. Therefore, the message would not be retained. Let's go for none
.
none, NULL, null, void, or NIL? :-)
Done: 0003daace455b835bc64023615c3d10a5fa257d9
We only need to agree with https://github.com/marvinroger/homie/issues/24#issuecomment-321784570 and we should be good 😇
I could live with these 4 states only and their propose names (ready is better than online, init better than boot). Maybe offline
could be renamed to lost
?
However, I still think an extra state for deep sleep would be useful.
The difference between disconnected
and sleeping
is, that disconnected
is a planned shutdown (e.g. closing the Homie-Program), while sleeping
is a state where the device is supposed to reconnect frequently.
Regarding an alert
state: I'm used from my "domain" at work that device have some self-diagnosting abilities. The simplest variant to do this, is to tell the controller in a standardized way, that there is some major problem. Having this in the convention is much easier than to have an implementation specific solution.
Then:
init
when connectedready
when fully announceddisconnected
when cleanly disconnectedsleeping
when cleanly disconnected for deep sleeplost
when "badly" disconnectedHow would you implement your alert state?
alert
would have the meaning "homie is running fine and is connected, but the device is not able to perform its main task", e.g. Sensor is providing no data.
So setting the alert-state is up to the "Application" that makes use of Homie.
The actual homie-implementation (e.g. homie-esp8266) should offer an API call to set the alert-state. Everything else, e.g. provide diagnosstic messages is implementation specific and out of scope of the homie-convention.
I'm aware that this is an extra feature, but I think a standard for this is quite useful for integration. It helps monitoring your devices.
For example you can easily shown them as "yellow" an a Homie "Dashboard": green OK, red lost, blue sleeping, white disconnected, orange: alert.
Alright:
init
when connectedready
when fully announceddisconnected
when cleanly disconnectedsleeping
when cleanly disconnected for deep sleeplost
when "badly" disconnectedalert
when something's wrongIs everyone ok?
I like :)
Hey, I'm back from the weekend and like what I'm reading 👍
I also agree that the "alert" and "sleeping" states are meaningful and can be useful for certain cases.
Regarding the discovery process: I'd still strongly vote to follow the idea of
you can consider some safe defaults and as you receive the new attributes, you can improve your knowledge about the device
Despite a few fundamental topics, all other topics should be optional and hence provided with a default in the convention. A discovering controller (like the openHAB binding) has to make sure to catch all retained messages before further processing the devices/nodes/properties by applying a simple delay/timeout mechanism. That should be easy and other services work similarly. @kwave what do you think?
Back to topic: "I like :)"
@marvinroger Plans moving forward ? Have we concluded on the new structure? Implemented in beta 2 ?
A reference implementation is part of the implementation, so I created an issue at https://github.com/marvinroger/homie-esp8266/issues/390.
Indeed, let's move forward.
Can we guys read the whole thing at https://github.com/marvinroger/homie/tree/redesign and write your final comments/thoughts here?
New standard looks amazing and solves so many of my issues that i solves with just having my own standard like also adding a "units" attribute.
(TODO Clarify device behaviour in extra section for $online (or $state) topic.)
Not sure how to comment on this though ?
Wondering if its better to use the last will to post information about unexpected connection lost
Currently disconnecting correctly and unexpected are the same = false at $online
Wondering if its better to use a number, 1 = online, 0 = offline and -1 = unexpected connection lost ??
Tell me what you think. Tim.