toverainc / willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
https://heywillow.io/
Apache License 2.0
2.63k stars 97 forks source link

Screen times out even with a request in flight #349

Closed skorokithakis closed 11 months ago

skorokithakis commented 11 months ago

When the REST endpoint takes too long to respond, Willow times out the screen at the specified interval while the request is still active. The correct behavior would be for the interval to apply after the end of any request (so the screen would time out 2-3 seconds after the request has finished).

Presumably Willow keeps track of its state and knows when it's idle, right? What's tricky about this, as mentioned in #347?

stintel commented 11 months ago

Presumably Willow keeps track of its state and knows when it's idle, right? What's tricky about this, as mentioned in #347?

Willow doesn't really have a state, other than two booleans (recording and restarting), but even if it did, it would still be tricky. If WAS command endpoint mode is enabled, the request is sent to WAS via WebSocket. At this time, the request is done. We cannot guarantee there will ever be a response (MQTT endpoint doesn't support it right now, there might be a connection problem, etc). If we were to keep the display on until a response arrives, we might leave the display on "forever".

What I propose is to reset the display timer when Willow receives a command endpoint response from WAS. The screen could still timeout with a request in flight, depending on the configured display timeout and the time it takes to process the response, but it will reactivate when the response comes in. This would already be an improvement, yet simple to implement.

@kristiankielhofner thoughts?

skorokithakis commented 11 months ago

Ah, I see, it's a fully asynchronous design. I think that's going to lead to issues down the line (as we can already see with this bug), because there's a mismatch between what the user expects (a confirmation to their request) and the behavior of the system (no guaranteed confirmation).

In my opinion, there should be two flows with Willow in general, a synchronous one (request-response) and an asynchronous one (notifications). I don't think there's a case where a user would be satisfied with Willow just ignoring them, the user will always want some sort of confirmation, even if just an "ok". This is also how all assistants work now (they show a LED or otherwise indicate that they're thinking about your request).

I know that this is hard to change now, but my proposal would be a design that is fundamentally a queue of asynchronous requests coming in, and synchronous requests that can override those. If I want to receive a notification asynchronously, I'm probably fine with it sitting in a queue for a few seconds while I'm done talking to my assistant about whatever it is I'm trying to do with them.

Telling the assistant "please add an appointment for tomorrow", having it say "your dinner is ready" and then "I've added your appointment" is surprising. Instead, I'd expect the appointment (the thing I'm actually engaging with the assistant with right now) to be handled first, and the asynchronous notifications to wait until idle time. This would require state, but will be better UX overall.

This is just a recommendation, though.

hamishcunningham commented 11 months ago

My variant says "I heard" + the first part of the speech + "etc., please wait."

Then further results can either be from whatever machine is running wis/was or via a willow notification, or from some other device entirely.

I don't think synchronous dialogue via an ESP32 is a wise direction to travel. It is already a minor miracle what willow is squeezing in there!

On Wed, 20 Dec 2023, 17:49 Stavros Korokithakis, @.***> wrote:

Ah, I see, it's a fully asynchronous design. I think that's going to produce problems down the line (as we can already see with this bug), because there's a mismatch between what the user expects (a confirmation to their request) and the behavior of the system (no guaranteed confirmation).

In my opinion, there should be two flows with Willow in general, a synchronous one (request-response) and an asynchronous one (notifications). I don't think there's a case where a user would be satisfied with Willow just ignoring them, the user will always want some sort of confirmation, even if just an "ok". This is also how all assistants work now (they show a LED or otherwise indicate that they're thinking about your request).

I know that this is hard to change now, but my proposal would be a design that is fundamentally a queue of asynchronous requests coming in, and synchronous requests that can override those. If I want to receive a notification asynchronously, I'm probably fine with it sitting in a queue for a few seconds while I'm done talking to my assistant about whatever it is I'm trying to do with them.

Telling the assistant "please add an appointment for tomorrow", having it say "your dinner is ready" and then "I've added your appointment" is surprising. Instead, I'd expect the appointment (the thing I'm actually engaging with the assistant with right now) to be handled first, and the asynchronous notifications to wait until idle time. This would require state, but will be better UX overall.

This is just a recommendation, though.

— Reply to this email directly, view it on GitHub https://github.com/toverainc/willow/issues/349#issuecomment-1864890790, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBRBFFDXPILS2EJ3F62S73YKMQKRAVCNFSM6AAAAABA4TJASSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRUHA4TANZZGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

skorokithakis commented 11 months ago

Hm, what do you mean, @hamishcunningham? Synchronous dialog wouldn't require anything more than keeping a connection open, would it? I understand that it's hard to retrofit reliability over an async connection, but that's an implementation issue, rather than a design issue.

hamishcunningham commented 11 months ago

Effectively you have only one thread to play with. Checking for incoming on a connection means not doing something else. There's a reason they're called microcontrollers not microprocessors :)

It is likely not impossible, but the audio job it is already doing is not far off impossible, so I think the priority is to keep that running smoothly and offload as much as possible elsewhere.

On Wed, 20 Dec 2023, 18:38 Stavros Korokithakis, @.***> wrote:

Hm, what do you mean, @hamishcunningham https://github.com/hamishcunningham? Synchronous dialog wouldn't require anything more than keeping a connection open, would it? I understand that it's hard to retrofit reliability over an async connection, but that's an implementation issue, rather than a design issue.

— Reply to this email directly, view it on GitHub https://github.com/toverainc/willow/issues/349#issuecomment-1864959948, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBRBFGXKOCYH6H4IPWC7OTYKMWBJAVCNFSM6AAAAABA4TJASSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRUHE2TSOJUHA . You are receiving this because you were mentioned.Message ID: @.***>

skorokithakis commented 11 months ago

Ah, hm, true, I forgot about that. In that case, your alternative with the reactivation of the screen sounds good to me, thanks!

kristiankielhofner commented 11 months ago

Generally speaking we heavily emphasize "voice interface". Support for the display is implemented because it's there and it's useful for testing, demos, etc.

That said, I like to say "If the screen matters you're doing it wrong". Display support for a voice interface is kind of a crutch; a good voice interface should provide quality UX via audio alone and that's what we emphasize.

skorokithakis commented 11 months ago

True, but it's not about the screen, it's about feedback that it's still processing your request vs that it's somehow failed without returning anything.

stintel commented 11 months ago

Synchronous dialog wouldn't require anything more than keeping a connection open, would it?

That's the thing, Willow sends a WebSocket message to WAS. To my knowledge the WebSocket protocol doesn't have synchronous messages, so it's not that simple. It would be possible to implement some kind of tracking, but this is definitely not trivial, and currently not a priority for us.

As for MQTT not responding at all, we added support for MQTT because some users expressed interest in sending commands via MQTT. We're not handling responses there because users of the MQTT command endpoint could have any kind of custom thing handling message from Willow, with any kind of custom response, so we can't implement a generic response handler. Responding with OK just when sending the message on MQTT succeeded would be weird, if MQTT endpoint wasn't able to understand the request.

For responses arriving out of order, I would personally prefer an alarm set for 13:00 to arrive as close to 13:00 as possible, and it not being delayed by a potential other request still in flight.

I'll look into adding some code to reactivate the screen / reset the screen timer when a response from the WAS command endpoint arrives.

kristiankielhofner commented 11 months ago

Good point, but we're also a little thrown off with response times that exceed even our default screen timeout (we measure performance in milliseconds). Whatever works for you, I guess, but going back to the Alexa issue that UX is unacceptable in terms of the project overall.

That said, a potential approach here could be some kind of audio feedback while waiting > X seconds for response but that's another development effort that will be low priority because as I said that's a generally terrible user experience.