wintercg / proposal-common-minimum-api

https://common-min-api.proposal.wintercg.org/
Other
227 stars 13 forks source link

Speech synthesis and speech recognition shipped in the browser and support SSML input #35

Closed guest271314 closed 1 year ago

guest271314 commented 1 year ago

Right now speech recognition on Chrome and Firefox captures the users voice and sends the users' recorded voice to remote servers for processing.

When Google voices are used on Chrome a network request is made to remote servers to process the text.

Neither Firefox nor Chrome supports SSML input to the speech synthesis engine. Speech Dispatcher maintainers are on board for supporting SSML input https://github.com/WICG/speech-api/issues/10#issuecomment-619589378.

Google support SSML input to their Google API's, not in the Web Speech API implementation - that cuts off input text at an arbitrary length.

ljharb commented 1 year ago

It doesn't sound like there's a universal web standard here for WinterCG to adopt.

guest271314 commented 1 year ago

Yes, there is a standard. I implemented SSML parsing by hand using JavaScript here https://github.com/guest271314/SSMLParser.

I can do this https://github.com/guest271314/GoogleNetworkSpeechSynthesis/blob/main/test.js at console on this Web page because Google supports SSML input to their Google API's. However, fails to support SSML processing in Web Speech API. This is another long-standing issue that is easily fixed merely by the will to do so.

ljharb commented 1 year ago

What speech would a server capture? The sound of fans in the data center?

guest271314 commented 1 year ago

What speech would a server capture? The sound of fans in the data center?

I'm not biting on bait. You are just looking for a reason to ban me from here under the auspices of being "attacked".

Ask WPT how they use a headless browser to test.

ljharb commented 1 year ago

I apologize if that sounded like baiting (and i have no authority to ban here anyways). I mean that WinterCG covers lots of environments where there is no user sitting at the machine, and as such, there's no sound input or screen visualizations to capture.

guest271314 commented 1 year ago

Chromium and Firefox headless can each capture screenshots and play and capture audio. Obviously the user is not looking at the screen of a headless browser. I suggest looking at how WPT tests navigator.mediaDevices.getDisplayMedia(), for that matter, just look at WPT as a whole and how that body tests.

ljharb commented 1 year ago

WinterCG environments are largely not browsers at all, although they may be (like electron).

guest271314 commented 1 year ago

@ljharb Look, somebody referred me to this repository on reddit, responding to my question for a single server that can be imported into QuickJS, Bun, Deno, and Node.js. For various reasons:

So my idea was to import Deno's HTTPS server into QuickJS. Then, hell, why not create a server that can be imported into all JavaScript runtimes? Seems like right up the alley of this body.

WinterCG environments are largely not browsers at all, although they may be (like electron).

Well, if you are going to cite Electron, Electron supports capturing screens.

ljharb commented 1 year ago

Yes, but it wouldn't make sense for eg node to support that, which makes it out of scope for this repo.

guest271314 commented 1 year ago

The term web apis is in your mission statement on your web site. I suggest removing that term if your only real interest is node not web apis interoperability.

ljharb commented 1 year ago

"web APIs" does not imply "all web APIs".

guest271314 commented 1 year ago

Yes it does. And you also claimed your interest is interoperability among web apis. You need to update your web site if you don't mean what you say.

guest271314 commented 1 year ago

Developers in the field are very interested in web apis operability. You make that claim then qualify the claim when developers in the field file issues there for.

If what you really mean is web apis you are interested in you need to list them to the exclusion of all others.

You folks have dozens of groups about your interests.