experimental nodejs for Direct Line Speech does not respond to streaming audio

cindyloo commented 5 years ago

I implemented the experimental code here (I understand it is experimental, but I think it should at least work fully, correct?)

https://github.com/microsoft/BotBuilder-Samples/tree/master/experimental/directline-speech/javascript_nodejs/02.echo-bot

I attach to the WebSocketConnector via a python websocket using pyaudio. python code:

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)
for i in range(0, (RATE // CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK, exception_on_overflow=False)
    frames.append(data)
    r = requests.get(url = WEBSOCKET_HOST, stream=True)
    s.send(pickle.dumps(frames), opcode=websocket.ABNF.OPCODE_BINARY)

Logging output from the nodejs code:

restify listening to http://[::]:3978

Get Bot Framework Emulator: https://aka.ms/botframework-emulator

To talk to your bot, open the emulator select "Open Bot"
websocket
Creating socket for WebSocket connection.
Creating server for WebSocket connection.
Listening on WebSocket server.

the "Creating socket... " happens when I open up the stream and send the audio to the websocket. nothing happens beyond this - the bot onMessage call is not activated via the audio stream. What should the user do additionally?

CoHealer commented 5 years ago

Customer replied to in linked issue

jwiley84 commented 5 years ago

Hi @cindyloo! The experimental samples don't always work the way they're supposed to or the way we'd expect. I'll go ahead and do some digging into this, and see if I can either get it working or reproduce your error.

cindyloo commented 5 years ago

thanks Jessica - here is my calling code in python which obtains the audio stream from my client

import pyaudio
import pickle
import websocket
#record
CHUNK = 512
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 10

HOST = ''    # The remote host
PORT = 3978              # The same port as used by the server
WEBSOCKET_HOST = 'ws://127.0.0.1:3978/api/messages'

p = pyaudio.PyAudio()
s = websocket.create_connection(WEBSOCKET_HOST)

s.send('Hii')

for i in range(0, p.get_device_count()):
    print(i, p.get_device_info_by_index(i)['name'])

print("open stream...")

frames = []

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print((RATE / CHUNK) * RECORD_SECONDS)

for i in range(0, (RATE // CHUNK * RECORD_SECONDS)):

    data = stream.read(CHUNK, exception_on_overflow=False)
    frames.append(data)
    #r = requests.get(url = WEBSOCKET_HOST, stream=True)
    s.send(pickle.dumps(frames), opcode=websocket.ABNF.OPCODE_BINARY)

print("---done recording---")

###
#waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
#waveFile.setnchannels(CHANNELS)
#waveFile.setsampwidth(p.get_sample_size(FORMAT))
#waveFile.setframerate(RATE)
#waveFile.writeframes(b''.join(frames))
#waveFile.close()

####

stream.stop_stream()
stream.close()
p.terminate()
s.close()

print("*closed")

the javascript code is just the example code from this repo

cindyloo commented 5 years ago

any updates here?

mdrichardson commented 5 years ago

@cindyloo I had a support ticket similar to this and this was my response (I'm not sure how much it applies to your issue):

I've got a workaround for this, but please inform the customer that:

Both the sample and the JS Speech packages are in the pre-preview stages and not meant for public use, especially not for production at this point
There will likely be a LOT of bugs with any JS speech implementations at this time

I was able to debug this by viewing "https://_yourDeployedBot_.scm.azurewebsites.net/dev/wwwroot/:vs.output" while attempting to message it. It presented an error that "logger" was undefined in index.js.

The workaround is, funny enough, to change "logger" to "undefined" in index.js, so that last section looks like this:

const { WebSocketConnector } = require('microsoft-bot-protocol-streamingextensions');
server.get('/api/messages', function upgradeRoute(req, res) {
    let wsc = new WebSocketConnector(bot, undefined);
    wsc.processAsync(req, res,
    {
        appId: process.env.MicrosoftAppId,
        appPassword: process.env.MicrosoftAppPassword,
        channelService: process.env.ChannelService
    }
    );
});

Note: Trying to use a custom logger or "console" didn't seem to work. I had to use "undefined".

Cx will then need to re-deploy their bot. I've noticed that it can then take upwards of 10 minutes for the bot to properly start and be ready to connect to (if it starts at all). I'm not sure if this is speech/websockets-related or not, but this is the first time I've experienced this.

cindyloo commented 5 years ago

hi Michael! thanks for your response. I ended up passing undefined to the WebSocketConnecter also. However, it never seemed to connect - how can you tell if it does anything? what behavior should I expect? thanks!

mdrichardson commented 5 years ago

@cindyloo When using the Directline Speech Client, your bot should be able to actually connect and use speech-to-text. I haven't used that sample for anything else.

cindyloo commented 5 years ago

how does the bot consume the audio stream? it seemed to me that there was a step missing in the sample code. My log said it was listening (inside the WebConnector at line 111 this.logger.log('Listening on WebSocket server.');) but then nothing happened...

cindyloo commented 5 years ago

oh - right, well, I'm on a Mac so I can't use the client :(

mdrichardson commented 5 years ago

@cindyloo I think your real issue is that you're trying to send audio directly to the bot instead of do DirectlineSpeech, which then interprets the audio and sends it to the bot. See the image at the top of this page for more info. You like also find the View client source code section informative, as well.

That being said, I only don't know much more about your issue than the title. I believe @jwiley84 has dug into this a bit more and can likely provide an answer for you this week.

cindyloo commented 5 years ago

no, I"m not trying to send it directly to the bot. I'm using the javascript/nodejs Direct Line Speech example which I was taking to mean that I can send an audio stream from a client (in this case, python) with a websocket GET connecting to the restify server in the nodejs code. I was expecting that the WebSocketConnecter would then somehow send that stream on to the Bot endpoint

DDEfromOR commented 5 years ago

I'm pretty late to this thread, but FWIW @mdrichardson is correct that the audio stream needs to be sent to the Direct Line Speech channel which then converts the audio to text and sends the result to the bot's endpoint. @cindyloo is correct that the websocket connection to the bot technically allows a bot to receive streamed audio, but the current samples don't support this and the intent with Direct Line Speech is for bots to continue to operate on text.

All of that being said, the bigger issue is the experimental samples are outdated and need to be fixed. There is now a public preview build of the Node.js library that can be added to bots to enable the web socket connection used by Direct Line Speech.

TLDR: The updated sample will look something like this:

Modify the call to create a Restify server to have it also handle websocket upgrades. let server = restify.createServer({ handleUpgrades: true });

Then add the following code to connect an incoming web socket request with the "streaming" adapter:

server.get('/api/messages', function upgradeRoute(req, res) {
    const adapter = new BotFrameworkStreamingAdapter(bot);
    adapter.connectWebSocket(req, res, { appId: process.env.MicrosoftAppId,
        appPassword: process.env.MicrosoftAppPassword,
        channelService: process.env.ChannelService,
    });
});

From here the bot endpoint is able to accept web socket GET requests from the channel and establish the "streaming" connection.

One of the main sources of confusion we're trying to clear up with better documentation and samples is around the term 'streaming'.

The bot upgrade to speak with Direct Line Speech and similar channels is "streaming" in the sense there is only one connection established and it's used for all communication between the channel and the bot, eliminating the overhead of the normal REST setup, with the goal being a reduction in latency in order to allow the bot to respond to user interactions without pauses. The main driving factor behind this was allowing existing bots that were built around text interactions to be able to seamlessly upgrade to working with Direct Line Speech.

Meanwhile, the client connection to the channel is "streaming" in the sense that it really does send audio (though AFAIK the audio is actually sent in clips and not a live stream here, either).

In a nutshell, the protocol used by the bot hasn't changed, we've only added support for new transports and opened the door to develop a real streaming protocol (or much more likely, adopt an existing one) in the future.

Thank you for trying out our experimental tech! You have no idea how much we appreciate developers willing to get their hands dirty and put up with all of the pain points involved with works in progress. I'll get in touch with the samples team about getting the experimental bot designs updated.

jwiley84 commented 5 years ago

@cindyloo, my apologies for not gettting back to you faster. It looks like @mdrichardson and @DDEfromOR have covered the topic completely. I'm going to go ahead and close this. Keep an eye out going forward for this to potentially shift from experimental to fully-realized sample!

cindyloo commented 5 years ago

ok, thanks for the information and clarification @DDEfromOR. I think the term 'streaming' can indeed be misleading - whether one is streaming audio from an application to the node.js example echobot or to separately set up the DLS channel and send the audio that way. I'd appreciate clearer examples and the differentiations. I'm not sure it makes sense to close b/c I would think you want to make sure the docs are updated before you do so...

theconvchamp commented 5 years ago

@DDEfromOR, do we have similar API on the .NET sdk? I am trying to achieve the same.

microsoft / BotBuilder-Samples

experimental nodejs for Direct Line Speech does not respond to streaming audio #1731

TLDR: The updated sample will look something like this: