Device states sleeping and alert unnecessary?

bggardner commented 4 years ago

I question the necessity of the sleeping and alert states, and propose removing them.

From the rationale for sleeping from #24 , I understand it may be appropriate for a device to only connect to a broker as needed to save power, resources, etc. However, subscribers to $state will not be able to determine if/when a device should be considered lost (in case something goes bad while sleeping). It seems this would need a timeout to implement properly. Therefore, either a sleeping $timeout attribute is needed, or remove sleeping and require devices to maintain connection to the broker at all times. Am I misunderstanding the relationship between a homie device and a MQTT broker?

My issue with alert is that devices can have many properties, and possibly only one property may be unavailable. Subscribers to a working property do not care about non-working properties, so setting an alert at the device level seems improper. Therefore, use $state on properties as well, or remove the alert state. Also, simply having an alert state isn't very useful by itself. The addition of an $alerts attribute would allow the device to communicate what the alert is about (or simply publish to this when there is an alert instead of changing its state).

I am relatively new to homie, but see its potential. My use cases and implementations may be biasing my thoughts in a direction homie was not intended for, so I would appreciate feedback regarding this issue.

piegamesde commented 4 years ago

My issue with alert is that devices can have many properties, and possibly only one property may be unavailable.

I don't think this assumption holds true in a significant amount of cases. Therefore distinguishing between working and broken properties seems overkill to me. Marking single nodes as broken or not is something worth discussing, but I don't think it's a good idea.

The alert state has the simple semantic of "something is wrong, please contact a human". I'm not sure if controllers should do anything more than reporting the error to the user.

I think providing an error message to the user (via mqtt) is a good idea. But is there anything against simply implementing this through a property instead of an attribute?

bggardner commented 4 years ago

Good points. The error message could be a property, but on which node? It seems if alert is set at the device level, that's where the error message should be. That's why I suggested an $alerts attribute (plural, so that multiple error messages could be written). That's really messy, so maybe it's better to not include error messages as part of the base convention.

The other reason I don't like using alert as a $state is that the other values deal with the state of the connection between the device and the MQTT broker. alert seems to be an internal problem, not related to the connection. In that regard, an alternative could be a boolean $alert attribute.

I'm being picky here, but thought it was worth some discussion, so thanks for replying.

piegamesde commented 4 years ago

I agree that the alert as $state is a bit unfortunate. We could remove it and replace it with an attribute (or remove it without replacement and let devices use a property of their choice if they want). But I'd like to try to rescue it by changing its meaning so that it tells more about the connection to the MQTT broker:

alert: this is the state the device is in when ~~connected to~~ disconnected from the MQTT broker ~~, but something wrong is happening~~ because of an un-recoverable failure. ~~E.g. a sensor is not providing data and~~ needs human intervention. You have to send this message when something is wrong.

With this definition, alert would be less like ready and more like lost, but disconnected due to error (in distinction to network outage by lost).
This a radically more severe kind of error comparing to "it's running, but please look at it". I don't know which is better because I can't tell which version fits more use cases.
If we want to more or less keep the current semantics, a separate attribute is the way to go.
Currently, one could say the current semantic of this is already implied by lost: "when the device has been “badly” disconnected" – this does not specify network errors, even if the name "lost" evokes this to me.

bggardner commented 4 years ago

Very good points (again). To reiterate:

disconnected - Graceful disconnect as an intended operation
alert - Graceful disconnect due to an internal device error (which now needs human intervention to reconnect)
lost - Ungraceful disconnect due to a network error (or uncaught exception/error in the device)

In this case, the word alert doesn't really fit. error may be more appropriate, and changing the name may be of benefit to avoid confusion from its previous usage/definition.

The previous definition of alert ("it's running, but please look at it"), seems to be on the severity level of a "warning" instead of an "error", which leads me to my initial comment about how to detect and communicate this to the user. Therefore, it may be better to leave this function to an extension rather than the base convention.

dougmeredith commented 3 years ago

I agree that the current definition of "alert" is a bit off, as a controller doesn't know if it should treat data from the device as valid or not. Instead of the redefinition of alert above, what about a disconnect reason attribute? This could be ignored by controllers, in general, but could be utilized by monitoring tools.

As far as moving some sort of concept of alerts down to the node, that could be done, although devices already have the option of simply dropping problem nodes from their configuration.

I think I see the most value from dropping alert as a state, and adding an additional device property of "alert" or something like that. I wouldn't attempt to codify this in any way. It would just be a string that when set would indicate a problem with the device. It would be up to the device to decide what information to present. Controllers would ignore it. Monitoring tools would present it to a user by whatever means is appropriate.

homieiot / convention

Device states sleeping and alert unnecessary? #210