Closed a112358132134 closed 2 years ago
This isn't a "race condition" but is a classic remote-observability problem when there are multiple channels of information.
Not a bug.
The primary source of state are the MQTT packets. They are all time stamped if you think it is valuable to confirm that they have been received in the order sent.
The queries over HTTP are mainly to be able to capture a good-enough approximation of current state to initialize or resynchronize a client.
Edit: Of the proposed mitigation approaches, sending a set of MQTT updates on request was previously considered and rejected. It was not chosen as it complicates all clients, requiring them to keep track of their current sense of state to determine if a change of state has been reported. Not only is this a burden on client design, but introduces yet another point of potential loss of synchronization.
If you can provide a concrete example where this is truly a problem that impacts the ability of a user to accomplish a meaningful use case, please do so. Otherwise I will close this shortly.
This isn't a "race condition" but is a classic remote-observability problem when there are multiple channels of information.
Apologies for the poor language choice - I've been out of software for a few decades, and struggle to articulate the concepts given how much I've forgotten.
The primary source of state are the MQTT packets. They are all time stamped if you think it is valuable to confirm that they have been received in the order sent.
Yes, I do verify the timestamps on all incoming mqtt packets. On one occasion, during rapid and repeated state-change stress testing, I did observe two mqtt packets that were processed out-of-order due to the vagaries of thread scheduling. This was gracefully dealt with by the time check logic; the older data was inserted into the historic record at the appropriate point to flesh out the datasets for graphing, but the app's view of the current state was not changed.
The queries over HTTP are mainly to be able to capture a good-enough approximation of current state to initialize or resynchronize a client. If you can provide a concrete example where this is truly a problem that impacts the ability of a user to accomplish a meaningful use case, please do so. Otherwise I will close this shortly.
The specific case I'm worried about is an unsafe function (water, flush, espresso, steam) being activated physically, but the user being unable to stop it due to a desync. This is not a concern for a DE1 with a Group Head Controller, but I don't have one of those.
As an aside, I have had such misalignments with the official decent app on several occasions over the last couple of years, most recently with a Flush function this morning that I could not stop in the app.
I have seen one state misalignment in my app with pyDE1 through testing (app thought the DE1 was awake, but it was actually asleep and hence couldn't be woken without outside intervention). Unfortunately I was only logging exceptions or odd cases with the incoming mqtt packets, so I can't precisely tell if mqtt vs http packet timing misalignment was the root cause, although it was most likely an error in the state logic in my app. Regardless, it led me to review how I was utilising pyDE1's API which resulted in this concern being raised.
On the possible mitigations I listed:
I'll leave the rest to your judgement.
The DE1 appears to send out a StateUpdate on connection, so pyDE1 should have current information nearly immediately after connection.
Issues with the de1app should be filed with their GitHub repo.
A future commit will enable the checks for sending an Idle request to be overridden through config.
With the current API, I understand there are two methods for obtaining the state of a device:
StateUpdate
for DE1 operational statesConnectivityChangeNotification
for DE1 and scale connectivity statesIn either case, I think there might be a possible race condition (from the consumer's perspective) between the mqtt and http alternatives above. As such I've raised this ticket for discussion.
Example of a problematic situation
Why this race condition is important
Most of the basic app functions are critical to be in sync with the DE1 (and scale). If a misalignment were to occur, given functionality may not be presented to the user to even be able to trigger a subsequent change of state and have everything realign itself - at least not without restarting one or more software or hardware components involved.
In the worst case example, a dangerous or problematic function (e.g. flush, descale) is in operation, but the consumer app is unable to do anything about it due to the misalignment. Admittedly, this is more of a concern on DE1's without a group head controller.
Mitigation options
event time
(side note: if you're doign this, also an opportuntity to include aversion
)StatusUpdate
mqtt message to be published to avoid the misalignment. This should be sent from pyDE1 in order, and even if something else happens after that, the mqtt message has theevent time
field to allow the app to determine if the message is still relevant.One or more of these options could be implemented.
Thoughts?