Add support of speech recognition for microphone node

node-red / node-red-ui-nodes

Additional nodes for Node-RED Dashboard

Apache License 2.0

117 stars 81 forks source link

Add support of speech recognition for microphone node #52

Closed HiroyasuNishiyama closed 3 years ago

HiroyasuNishiyama commented 3 years ago

[ ] Bugfix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)

Proposed changes

This PR add speech recognition feature to ui_microphone node using SpeechRecognition interface of Web Speech API.

Checklist

[x] I have read the contribution guidelines
[x] For non-bugfix PRs, I have discussed this change on the forum/slack team.
[ ] I have run grunt to verify the unit tests pass
[ ] I have added suitable unit tests to cover the new/changed functionality

dceejay commented 3 years ago

Is it also worth looking at adding a flag (under speech mode only) - for interim results operation - https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/interimResults -or maybe that should possible the default when in click to start and click to stop mode... ie so you get streamed results while in that mode ?

Thoughts ?

HiroyasuNishiyama commented 3 years ago

Is it also worth looking at adding a flag (under speech mode only) - for interim results operation - https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/interimResults -or maybe that should possible the default when in click to start and click to stop mode... ie so you get streamed results while in that mode ?

In our original implementation, specifying the interim results option would display the intermediate recognition results on the dashboard. I think it can be used for such purposes. However, if the node outputs interim results, distinguishing between the final and intermediate results is needed. I would like to suggest the node outputs interim results to second output port if the interim results option is specified.

HiroyasuNishiyama commented 3 years ago

Hi. Updated speech recognition mode to use same button mode of audio input with fixes you pointed out. Also added example flows for both mode.

dceejay commented 3 years ago

Yes - I think showing intermediate results on the dashboard is one use case - but that could be done via the backend - - IE don't implement in the node itself - if the user wants it they could feed the info back to another widget. I think there are other use cases where you may want ongoing data fed back. I don't think the node / reco engine will handle both modes at once so I don't know how a second output could work. Though once the reco has stopped presumable we can send a done ? so you know the microphone is no longer expected to send more.

HiroyasuNishiyama commented 3 years ago

Though once the reco has stopped presumable we can send a done ? so you know the microphone is no longer expected to send more.

Changed to set done property to true instead of sending a message to second port if interim flag is set.

dceejay commented 3 years ago

Looks good now - Using Done works nicely - thank you. only wrinkle I can see is if I drag on a new node and set to reco mode the default time is 0 seconds. It should default to some useful value (maybe 5 to be consistent with the audio mode. ? ) I think leaving as 0 doesn't make sense unless you have intermediate results on by default... which we don't want (I think).

dceejay commented 3 years ago

I'm going to merge this as-is and we can work on it from there in master. I won't publish it just yet.

HiroyasuNishiyama commented 3 years ago

I think leaving as 0 doesn't make sense unless you have intermediate results on by default... which we don't want (I think).

SpeechRecognition interface outputs a final result for each phrase (at least for Japanese recognition). The interim result will change before fixed as final result.
For this reason, I intentionally set the default value to 0 in speech recognition mode.