Open proppy opened 5 years ago
Copying from https://github.com/googlesamples/assistant-sdk-python/issues/356
I would also consider the support for custom event callbacks/hooks.
Ideally, I see two main usage patterns for the assistant: a synchronous one, where a generator approach works the best:
with Assistant(**opts) as assistant:
for event in assistant:
print(event)
and an asynchronous one, where the handlers might be executed in other threads, and in this kind of approach the callback pattern can really help keeping the code clean:
with Assistant(on_conversation_start=on_conversation_start,
on_speech_recognized=on_speech_recognized,
..., **opts) as assistant:
# Do other stuff
where the handlers might be executed in other threads
wouldn't the handlers need to dispatch the call the thread anyway using something like https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.submit
@BlackLight curious about your thoughts about of callbacks vs inheritance + override?
Another possibility could be have a @decorator
model, similar to what https://github.com/pallets/flask or https://github.com/pallets/click project do, something like:
from google.assistant.sdk import assistant
@assistant.handler('CONVERSATION_START'):
def handle_conversation():
pass
That could also enable us to unify the event handler with device actions and custom device actions:
from google.assistant.sdk import assistant
@assistant.handler('action.devices.commands.OnOff'):
def handle_onoff():
pass
@assistant.handler('com.example.commands.BlinkLight'):
def handle_blink():
pass
What do you think?
In my opinion the callback mechanism allows to handle interactions in a more flexible way than a synchronous yield
generator. What I have set up so far in my project is something like:
class Assistant(threading.Thread):
def __init__(self, event_handler, *args, **kwargs):
self.event_handler = event_handler
def run():
while True:
event = self._poll_event()
if self.event_handler:
self.event_handler(assistant, event)
class MyClient:
def __init__(*args, **kwargs):
self.assistant = Assistant(event_handler=self.event_handler())
self.assistant.start()
def event_handler(self):
def wrapped_handler(assistant, event):
print(event)
return wrapped_handler
With this pattern I can execute the event handler within the assistant thread, but the handler itself is within MyClient
, so it can access all the attributes of my object as well. It's not strictly like dispatching the call to another thread (that would indeed require some degree of black magic with concurrent
), but it's flexible enough for me to decide where and how some handler should be executed. One could even get the handler to run the custom logic in a new thread so in case of failure it won't abort the assistant thread.
Thinking of it, however, I agree that in Python it's possible to implement better patterns than a callback approach borrowed from JS.
I like the inheritance+override approach, I've also initially proposed it in https://github.com/googlesamples/assistant-sdk-python/issues/356. It feels a bit Java-ish to me though, as it would require the developer to define a new class and override all the needed handlers. Not that it's a bad thing, but it would make a quick-start a bit more verbose.
Thinking of it I like the decorator approach the most, and it would indeed make things consistent with what's done with the device_handler
as well. It's probably also the most "pythonic" way to implement things, even though I'd still offer the option for developers to opt either for the generator approach (if they have a script that needs to access events synchronously anyway) or the decorator approach (if they have a more complex business logic and want to decouple their own logic from the assistant). I'd also consider the option of an empty decorator, something like @assistant.handler()
, to offer the possibility to capture all the events in one function without needing to setup a handler for each type of event.
It would be also nice to start audit project that uses the google-assistant-grpc
binding directly to see how they could benefit from an higher level interface.
A few I'm aware of:
Filed https://github.com/googlesamples/assistant-sdk-python/issues/381 as it seems useful to audit further reverse dependencies of the project.
Something else to consider would be a drastically higher level interface, with top-level module function that allow for a REPL-like experience:
>>> from google import assistant
>>> assistant.assist('What's the weather in Tokyo?')
'Clear`
>>> with open('input.wav') as f: assistant.assist(f)
<_io.BytesIO object at 0x7f8e0575b350>
>>> import sounddevice as sd
>>> sd.play(assistant.assist(sd.rec(10)))
I like the idea of using sounddevice to handle the audio, it also decouples audio management from the SDK. If we only use one method for all the interactions however I'm not sure of what the return type should be - maybe an AssistantResult
object that contains query_text
, response_text
and error
?
It'd also still maintain the ability to optionally subscribe to assistant events, as in some applications it's neater to have some kind of event-based interface:
from google import Assistant
import sounddevice as sd
def handler(assistant, event):
print('Received assistant event: {}'.format(event))
assistant = Assistant(event_handler=handler)
while True:
try:
assistant.assist(sd.rec(10))
except InterruptedError:
break
I like the idea of using sounddevice to handle the audio, it also decouples audio management from the SDK.
Yep, we currently use sounddevice in audio_helpers
but the idea here would be to simply accept and returns numpy
array and handle the sounddevice plumbing in the calling code.
If we only use one method for all the interactions however I'm not sure of what the return type should be - maybe an AssistantResult object that contains query_text, response_text and error?
Python itself should be fine w/ a single method that returns different type, and the convention should be easy to understand: if you give a T
it returns a T
.
input | output |
---|---|
string | string |
file-like object | file-like object) |
numpy array of audio samples | numpy array of audio samples |
AssistRequest iterable |
AssistResponse iterable |
It'd also still maintain the ability to optionally subscribe to assistant events
Would the ability of passing raw AssistRequest/Response
be enough or do you need the ability to handle event at the same time of passing one of the other type?
Would the ability of passing raw
AssistRequest/Response
be enough or do you need the ability to handle event at the same time of passing one of the other type?
I'm thinking of use case where you initialize the assistant "in the background", do other business logic (listen on a message queue, provide a web service etc.), and you only get notified on a custom event handler when something happens - where something can be a hotword event, a request event, a response event, a device event, an end of conversation event, etc.
Of course wrapping the interface you proposed into a separate thread and implementing a basic pub/sub mechanism in Python would also be possible within a couple of lines of code, but I'd be nice if the new SDK also provided an event-based interface so the developers who prefer this pattern for the assistant don't have to reinvent the wheel (just my two cents).
I'm thinking of use case where you initialize the assistant "in the background"
I wonder if this could composed wel with the synchronous style proposed in https://github.com/googlesamples/assistant-sdk-python/issues/358#issuecomment-552608649, where the caller would be responsible for triggering the assistant and handle the response as the return value.
Maybe by combining it decorator approach proposed in https://github.com/googlesamples/assistant-sdk-python/issues/358#issuecomment-512249889?
from google import assistant
@assistant.on_end_of_utterance()
def end_of_query():
print("end of query")
@assistant.on_speech_recognition_result()
def in_progress_query(result)
print("in progress query:", result)
assistant.assist('What's the weather in Tokyo?')
As shown by the our documentation https://developers.google.com/assistant/sdk/guides/service/integrate#implement_a_basic_conversation_dialog_with_the_assistant and the rather complicated https://github.com/googlesamples/assistant-sdk-python/blob/84995692f35be8e085de8dfa7032039a13ae3fab/google-assistant-sdk/googlesamples/assistant/grpc/pushtotalk.py example, creating a functional assistant on top of the generated gRPC bindings is not trivial, it currently requires developer to:
AssistRequest
messageEND_OF_CONVERSATION
event to stop audio captureAssistResponse
message and playback incoming assistant audio samplesSimpler example like https://github.com/googlesamples/assistant-sdk-python/blob/84995692f35be8e085de8dfa7032039a13ae3fab/google-assistant-sdk/googlesamples/assistant/grpc/audiofileinput.py do a better job at showing low-level gRPC integration into a Python application, but they only handle a simpler conversation turn.
As suggested in #356, developer could benefit from a event based interface on on top of google-assistant-grpc, such wrapper should:
google-assistant-library
One (pythonic) way to implement an event interface could be to do similar to the
google-assistant-library
, where event are surfaced using python generator, for example:That would have the added value of providing some level of familiarity to former users of the
google-assistant-library