speechly / speech-recognition-polyfill

Polyfill for the SpeechRecognition standard on web, using Speechly as the underlying API.
MIT License
38 stars 5 forks source link

Browser Support not clarified at all #28

Closed jabcreations closed 2 years ago

jabcreations commented 2 years ago

While discussing whether speech recognition was a thing or not and confirming it is I additionally searched for a polyfill because of course Mozilla dropped another ball.

Unfortunately while there is a Browser Support header on the read-me part of the page there is no meaningful information provided for it. What are the underlying APIs required for this polyfill to work? That is what determines the browser support.

I am not working with nor planning (any time soon at least) to be working with speech recognition so please do not interpret this as some pressing immediate need. I am purely curious how far back each of the browser engines (Blink/Chrome, Gecko/Firefox, Trident/Internet Explorer, WebKit/Safari) support this polyfill and it would be very useful for anyone else who also cares about supporting browsers that give people choices versus the mainstream destruction of browsers.

JamesBrill commented 2 years ago

Hi @jabcreations A reference to the underlying APIs would clarify browser support for consumers of this library - this is a reasonable suggestion, one that I'll update the README with.

The polyfill is a thin wrapper around the Speechly browser client. Broadly speaking, the two main browser APIs that the client depends on are MediaDevices and AudioContext. Judging by caniuse, the fields/methods the client uses on those interfaces are supported by browsers that cover roughly 95% of Internet users. Data on support within less mainstream browsers is harder to come by, but I'd expect it to be pretty good given these have been supported in the mainstream since ~2016.

Thanks for the suggestion!

jabcreations commented 2 years ago

Thanks for your prompt reply James! A rough translation based on your reply suggests minimal browser support is:

This won't work in Otter (WebKit/Safari 10.0) or Presto (Opera) which was discontinued ("Opera" is now just a Chrome-clone and thus Opera 12.1 is considered the final release).

This likely will work with Pale Moon and Waterfox Classic. All other browsers are typically Chrome-clones.

The main take away is that with Speechly all versions of Firefox and it's forks from 33.0+ should suddenly gain speech recognition support. I'm sure within a year or two tops I'll likely end up coming back here to utilize Speechly at some point. The fact that there were no open issues suggestions it was likely written pretty well.

I would highly recommend not relying solely on a Linux install command. I run Windows for local development (LAMP for the live server) and that immediately irritates me as a developer as Linux is not yet usable for production.

JamesBrill commented 2 years ago

Thanks for this, most educational! I wasn't aware of some of these browser engines.

I would highly recommend not relying solely on a Linux install command.

Are you referring to npm install --save @speechly/speech-recognition-polyfill? What's the Windows equivalent? Is it just a case of escaping the symbols?

jabcreations commented 2 years ago

I rarely use third party software though for something as specific and convoluted (in a sense) I'd setup an empty directory in Windows, plop the files in that I need (or "appear" to need), create an index.xhtml (I only use the XML parser, HTML parser is weak and pointless tolerant of garbage code) and start figuring out how things work. The less I rely on other people's systems the more it'll force me to comprehend what I'm dealing with and thus when I learn something new I'm able to do so using minimal code. My web platform serves around 211KB as the Administrator after compression, a lot of websites are well over 10 megabytes as a guest and still also after compression!

My recommendation is to have a stand-alone test case with some very basic example that can immediately demonstrate that the polyfill works. If I can extract the files from a ZIP, open the XHTML file and have JavaScript's onload event just work then that builds much higher confidence that I'm not wasting my time. Of course this is for speech recognition so asking for microphone permissions and having at least a basic error response if the user fails to give microphone permissions for the page is important. Because speech recognition is a bit more involved than just using JavaScript's onload event handler I'd recommend having an ol list element with directions on what to do. Then either have the recognized speech output in to a textarea.value or a paragraph's .textContent property. Any way, having a demonstration/ directory in the main ZIP folder would be a clear example of what I'd look for. If this was compiled software I'd make it blatantly clear that the binaries are included; I don't understand the notion of "everyone wants to code!" - no, we need to USE IT! Some people might poke around in the code.

Another reason I don't rely on systems to handle this kind of stuff is "crust". A lot of Windows users lost gigabytes of personal and work files with a large update and there is the Program Files dumping directories. Linux has it's "crust" too that needs to be cleaned up. Being organized is a huge prerequisite to being successful.

Here is an XHTML+HTML5 template file that I use to test new things out:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html>

Base Test File

Base Test File

JamesBrill commented 2 years ago

Fascinating stuff - that's a very DIY setup! I think you might be an outlier - most consumers of this repo will just want to paste in a Node Package Manager command and consume the polyfill as a module. Such is the ubiquity of npm these days. However, I can see the merit in keeping your web page and development environment bloat-free.

What if this repo included a minified JS file you could drop into your directory and then import via a <script> tag? It'd probably be the output of the TypeScript compiler with some extra logic to add createSpeechlySpeechRecognition to the global scope.

So your test file would look something like:

<title>Base Test File</title>
<script type="application/javascript" src="path_to_polyfill_file"></script>
<script type="application/javascript">
window.onload = function(event)
{
  const appId = "<your_speechly_app_id>";
  const SpeechlySpeechRecognition = createSpeechlySpeechRecognition(appId);
  const speechRecognition = new SpeechlySpeechRecognition();
  // Play with speech recognition
}
</script>

I can probably tweak the TypeScript configuration in this repo to produce an alternative like that for those that want to go npm-free.