provide event driven interface on top of google-assistant-grpc

proppy commented 5 years ago

As shown by the our documentation https://developers.google.com/assistant/sdk/guides/service/integrate#implement_a_basic_conversation_dialog_with_the_assistant and the rather complicated https://github.com/googlesamples/assistant-sdk-python/blob/84995692f35be8e085de8dfa7032039a13ae3fab/google-assistant-sdk/googlesamples/assistant/grpc/pushtotalk.py example, creating a functional assistant on top of the generated gRPC bindings is not trivial, it currently requires developer to:

construct and authorize a bi-directional gRPC client
capture audio input samples and generate an asynchronous stream of AssistRequest message
handle END_OF_CONVERSATION event to stop audio capture
handle an asynchronous stream of AssistResponse message and playback incoming assistant audio samples
maintain conversation state

Simpler example like https://github.com/googlesamples/assistant-sdk-python/blob/84995692f35be8e085de8dfa7032039a13ae3fab/google-assistant-sdk/googlesamples/assistant/grpc/audiofileinput.py do a better job at showing low-level gRPC integration into a Python application, but they only handle a simpler conversation turn.

As suggested in #356, developer could benefit from a event based interface on on top of google-assistant-grpc, such wrapper should:

initialize and maintain the gRPC connection
surface high level conversation event similar to the google-assistant-library
maintain the conversation state
manage audio stream for recording and playback

One (pythonic) way to implement an event interface could be to do similar to the google-assistant-library, where event are surfaced using python generator, for example:

from google.assistant.sdk import assistant
for event in assistant.assist():
    print(event)

That would have the added value of providing some level of familiarity to former users of the google-assistant-library

blacklight commented 5 years ago

Copying from https://github.com/googlesamples/assistant-sdk-python/issues/356

I would also consider the support for custom event callbacks/hooks.

Ideally, I see two main usage patterns for the assistant: a synchronous one, where a generator approach works the best:

with Assistant(**opts) as assistant:
    for event in assistant:
        print(event)

and an asynchronous one, where the handlers might be executed in other threads, and in this kind of approach the callback pattern can really help keeping the code clean:

with Assistant(on_conversation_start=on_conversation_start,
                         on_speech_recognized=on_speech_recognized,
                         ..., **opts) as assistant:
    # Do other stuff

proppy commented 5 years ago

where the handlers might be executed in other threads

wouldn't the handlers need to dispatch the call the thread anyway using something like https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.submit

@BlackLight curious about your thoughts about of callbacks vs inheritance + override?

Another possibility could be have a @decorator model, similar to what https://github.com/pallets/flask or https://github.com/pallets/click project do, something like:

from google.assistant.sdk import assistant

@assistant.handler('CONVERSATION_START'):
def handle_conversation():
   pass

That could also enable us to unify the event handler with device actions and custom device actions:

from google.assistant.sdk import assistant

@assistant.handler('action.devices.commands.OnOff'):
def handle_onoff():
   pass

@assistant.handler('com.example.commands.BlinkLight'):
def handle_blink():
   pass

What do you think?

blacklight commented 5 years ago

In my opinion the callback mechanism allows to handle interactions in a more flexible way than a synchronous yield generator. What I have set up so far in my project is something like:

class Assistant(threading.Thread):
    def __init__(self, event_handler, *args, **kwargs):
        self.event_handler = event_handler

    def run():
        while True:
            event = self._poll_event()
            if self.event_handler:
                self.event_handler(assistant, event)

class MyClient:
    def __init__(*args, **kwargs):
        self.assistant = Assistant(event_handler=self.event_handler())
        self.assistant.start()

    def event_handler(self):
        def wrapped_handler(assistant, event):
            print(event)
        return wrapped_handler

With this pattern I can execute the event handler within the assistant thread, but the handler itself is within MyClient, so it can access all the attributes of my object as well. It's not strictly like dispatching the call to another thread (that would indeed require some degree of black magic with concurrent), but it's flexible enough for me to decide where and how some handler should be executed. One could even get the handler to run the custom logic in a new thread so in case of failure it won't abort the assistant thread.

Thinking of it, however, I agree that in Python it's possible to implement better patterns than a callback approach borrowed from JS.

I like the inheritance+override approach, I've also initially proposed it in https://github.com/googlesamples/assistant-sdk-python/issues/356. It feels a bit Java-ish to me though, as it would require the developer to define a new class and override all the needed handlers. Not that it's a bad thing, but it would make a quick-start a bit more verbose.

Thinking of it I like the decorator approach the most, and it would indeed make things consistent with what's done with the device_handler as well. It's probably also the most "pythonic" way to implement things, even though I'd still offer the option for developers to opt either for the generator approach (if they have a script that needs to access events synchronously anyway) or the decorator approach (if they have a more complex business logic and want to decouple their own logic from the assistant). I'd also consider the option of an empty decorator, something like @assistant.handler(), to offer the possibility to capture all the events in one function without needing to setup a handler for each type of event.

proppy commented 4 years ago

It would be also nice to start audit project that uses the google-assistant-grpc binding directly to see how they could benefit from an higher level interface.

A few I'm aware of:

Filed https://github.com/googlesamples/assistant-sdk-python/issues/381 as it seems useful to audit further reverse dependencies of the project.

proppy commented 4 years ago

Something else to consider would be a drastically higher level interface, with top-level module function that allow for a REPL-like experience:

>>> from google import assistant
>>> assistant.assist('What's the weather in Tokyo?')
'Clear`
>>> with open('input.wav') as f: assistant.assist(f)
<_io.BytesIO object at 0x7f8e0575b350>
>>> import sounddevice as sd
>>> sd.play(assistant.assist(sd.rec(10)))

blacklight commented 4 years ago

I like the idea of using sounddevice to handle the audio, it also decouples audio management from the SDK. If we only use one method for all the interactions however I'm not sure of what the return type should be - maybe an AssistantResult object that contains query_text, response_text and error?

It'd also still maintain the ability to optionally subscribe to assistant events, as in some applications it's neater to have some kind of event-based interface:

from google import Assistant
import sounddevice as sd

def handler(assistant, event):
    print('Received assistant event: {}'.format(event))

assistant = Assistant(event_handler=handler)

while True:
    try:
        assistant.assist(sd.rec(10))
    except InterruptedError:
        break

proppy commented 4 years ago

I like the idea of using sounddevice to handle the audio, it also decouples audio management from the SDK.

Yep, we currently use sounddevice in audio_helpers but the idea here would be to simply accept and returns numpy array and handle the sounddevice plumbing in the calling code.

If we only use one method for all the interactions however I'm not sure of what the return type should be - maybe an AssistantResult object that contains query_text, response_text and error?

Python itself should be fine w/ a single method that returns different type, and the convention should be easy to understand: if you give a T it returns a T.

input	output
string	string
file-like object	file-like object)
numpy array of audio samples	numpy array of audio samples
`AssistRequest` iterable	`AssistResponse` iterable

It'd also still maintain the ability to optionally subscribe to assistant events

Would the ability of passing raw AssistRequest/Response be enough or do you need the ability to handle event at the same time of passing one of the other type?

blacklight commented 4 years ago

Would the ability of passing raw AssistRequest/Response be enough or do you need the ability to handle event at the same time of passing one of the other type?

I'm thinking of use case where you initialize the assistant "in the background", do other business logic (listen on a message queue, provide a web service etc.), and you only get notified on a custom event handler when something happens - where something can be a hotword event, a request event, a response event, a device event, an end of conversation event, etc.

Of course wrapping the interface you proposed into a separate thread and implementing a basic pub/sub mechanism in Python would also be possible within a couple of lines of code, but I'd be nice if the new SDK also provided an event-based interface so the developers who prefer this pattern for the assistant don't have to reinvent the wheel (just my two cents).

proppy commented 4 years ago

I'm thinking of use case where you initialize the assistant "in the background"

I wonder if this could composed wel with the synchronous style proposed in https://github.com/googlesamples/assistant-sdk-python/issues/358#issuecomment-552608649, where the caller would be responsible for triggering the assistant and handle the response as the return value.

Maybe by combining it decorator approach proposed in https://github.com/googlesamples/assistant-sdk-python/issues/358#issuecomment-512249889?

from google import assistant

@assistant.on_end_of_utterance()
def end_of_query():
    print("end of query")

@assistant.on_speech_recognition_result()
def in_progress_query(result)
    print("in progress query:", result)

assistant.assist('What's the weather in Tokyo?')

googlesamples / assistant-sdk-python

provide event driven interface on top of google-assistant-grpc #358