Closed cindyloo closed 5 years ago
Customer replied to in linked issue
Hi @cindyloo! The experimental samples don't always work the way they're supposed to or the way we'd expect. I'll go ahead and do some digging into this, and see if I can either get it working or reproduce your error.
thanks Jessica - here is my calling code in python which obtains the audio stream from my client
import pyaudio
import pickle
import websocket
#record
CHUNK = 512
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 10
HOST = '' # The remote host
PORT = 3978 # The same port as used by the server
WEBSOCKET_HOST = 'ws://127.0.0.1:3978/api/messages'
p = pyaudio.PyAudio()
s = websocket.create_connection(WEBSOCKET_HOST)
s.send('Hii')
for i in range(0, p.get_device_count()):
print(i, p.get_device_info_by_index(i)['name'])
print("open stream...")
frames = []
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print((RATE / CHUNK) * RECORD_SECONDS)
for i in range(0, (RATE // CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK, exception_on_overflow=False)
frames.append(data)
#r = requests.get(url = WEBSOCKET_HOST, stream=True)
s.send(pickle.dumps(frames), opcode=websocket.ABNF.OPCODE_BINARY)
print("---done recording---")
###
#waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
#waveFile.setnchannels(CHANNELS)
#waveFile.setsampwidth(p.get_sample_size(FORMAT))
#waveFile.setframerate(RATE)
#waveFile.writeframes(b''.join(frames))
#waveFile.close()
####
stream.stop_stream()
stream.close()
p.terminate()
s.close()
print("*closed")
the javascript code is just the example code from this repo
any updates here?
@cindyloo I had a support ticket similar to this and this was my response (I'm not sure how much it applies to your issue):
I've got a workaround for this, but please inform the customer that:
Both the sample and the JS Speech packages are in the pre-preview stages and not meant for public use, especially not for production at this point
There will likely be a LOT of bugs with any JS speech implementations at this time
I was able to debug this by viewing "https://_yourDeployedBot_.scm.azurewebsites.net/dev/wwwroot/:vs.output" while attempting to message it. It presented an error that "logger" was undefined in index.js.
The workaround is, funny enough, to change "logger" to "undefined" in index.js, so that last section looks like this:
const { WebSocketConnector } = require('microsoft-bot-protocol-streamingextensions');
server.get('/api/messages', function upgradeRoute(req, res) {
let wsc = new WebSocketConnector(bot, undefined);
wsc.processAsync(req, res,
{
appId: process.env.MicrosoftAppId,
appPassword: process.env.MicrosoftAppPassword,
channelService: process.env.ChannelService
}
);
});
Note: Trying to use a custom logger or "console" didn't seem to work. I had to use "undefined".
Cx will then need to re-deploy their bot. I've noticed that it can then take upwards of 10 minutes for the bot to properly start and be ready to connect to (if it starts at all). I'm not sure if this is speech/websockets-related or not, but this is the first time I've experienced this.
hi Michael! thanks for your response. I ended up passing undefined to the WebSocketConnecter also. However, it never seemed to connect - how can you tell if it does anything? what behavior should I expect? thanks!
@cindyloo When using the Directline Speech Client, your bot should be able to actually connect and use speech-to-text. I haven't used that sample for anything else.
how does the bot consume the audio stream? it seemed to me that there was a step missing in the sample code. My log said it was listening (inside the WebConnector at line 111 this.logger.log('Listening on WebSocket server.');
) but then nothing happened...
oh - right, well, I'm on a Mac so I can't use the client :(
@cindyloo I think your real issue is that you're trying to send audio directly to the bot instead of do DirectlineSpeech, which then interprets the audio and sends it to the bot. See the image at the top of this page for more info. You like also find the View client source code section informative, as well.
That being said, I only don't know much more about your issue than the title. I believe @jwiley84 has dug into this a bit more and can likely provide an answer for you this week.
no, I"m not trying to send it directly to the bot. I'm using the javascript/nodejs Direct Line Speech example which I was taking to mean that I can send an audio stream from a client (in this case, python) with a websocket GET connecting to the restify server in the nodejs code. I was expecting that the WebSocketConnecter
would then somehow send that stream on to the Bot endpoint
I'm pretty late to this thread, but FWIW @mdrichardson is correct that the audio stream needs to be sent to the Direct Line Speech channel which then converts the audio to text and sends the result to the bot's endpoint. @cindyloo is correct that the websocket connection to the bot technically allows a bot to receive streamed audio, but the current samples don't support this and the intent with Direct Line Speech is for bots to continue to operate on text.
All of that being said, the bigger issue is the experimental samples are outdated and need to be fixed. There is now a public preview build of the Node.js library that can be added to bots to enable the web socket connection used by Direct Line Speech.
Modify the call to create a Restify server to have it also handle websocket upgrades.
let server = restify.createServer({ handleUpgrades: true });
Then add the following code to connect an incoming web socket request with the "streaming" adapter:
server.get('/api/messages', function upgradeRoute(req, res) {
const adapter = new BotFrameworkStreamingAdapter(bot);
adapter.connectWebSocket(req, res, { appId: process.env.MicrosoftAppId,
appPassword: process.env.MicrosoftAppPassword,
channelService: process.env.ChannelService,
});
});
From here the bot endpoint is able to accept web socket GET requests from the channel and establish the "streaming" connection.
One of the main sources of confusion we're trying to clear up with better documentation and samples is around the term 'streaming'.
The bot upgrade to speak with Direct Line Speech and similar channels is "streaming" in the sense there is only one connection established and it's used for all communication between the channel and the bot, eliminating the overhead of the normal REST setup, with the goal being a reduction in latency in order to allow the bot to respond to user interactions without pauses. The main driving factor behind this was allowing existing bots that were built around text interactions to be able to seamlessly upgrade to working with Direct Line Speech.
Meanwhile, the client connection to the channel is "streaming" in the sense that it really does send audio (though AFAIK the audio is actually sent in clips and not a live stream here, either).
In a nutshell, the protocol used by the bot hasn't changed, we've only added support for new transports and opened the door to develop a real streaming protocol (or much more likely, adopt an existing one) in the future.
Thank you for trying out our experimental tech! You have no idea how much we appreciate developers willing to get their hands dirty and put up with all of the pain points involved with works in progress. I'll get in touch with the samples team about getting the experimental bot designs updated.
@cindyloo, my apologies for not gettting back to you faster. It looks like @mdrichardson and @DDEfromOR have covered the topic completely. I'm going to go ahead and close this. Keep an eye out going forward for this to potentially shift from experimental to fully-realized sample!
ok, thanks for the information and clarification @DDEfromOR. I think the term 'streaming' can indeed be misleading - whether one is streaming audio from an application to the node.js example echobot or to separately set up the DLS channel and send the audio that way. I'd appreciate clearer examples and the differentiations. I'm not sure it makes sense to close b/c I would think you want to make sure the docs are updated before you do so...
@DDEfromOR, do we have similar API on the .NET sdk? I am trying to achieve the same.
I implemented the experimental code here (I understand it is experimental, but I think it should at least work fully, correct?)
https://github.com/microsoft/BotBuilder-Samples/tree/master/experimental/directline-speech/javascript_nodejs/02.echo-bot
I attach to the WebSocketConnector via a python websocket using pyaudio. python code:
Logging output from the nodejs code:
the "Creating socket... " happens when I open up the stream and send the audio to the websocket. nothing happens beyond this - the bot onMessage call is not activated via the audio stream. What should the user do additionally?