gregsadetsky / sagittarius

A GPT-4/Gemini Voice/Video Exploration Tool
http://sagittarius.greg.technology/
685 stars 94 forks source link

webkitSpeechRecognition not working well in Android #13

Open tanu360 opened 10 months ago

tanu360 commented 10 months ago

Browser : Chrome for Android 120xxx

Speech API is supported and other similar code is working but yours not.

gregsadetsky commented 10 months ago

hey thanks for reporting this problem

are you able to access the devtools? do you see specifically where the problem is happening?

also, does this page work in this browser? https://www.google.com/intl/en/chrome/demos/speech.html

thanks!

tanu360 commented 10 months ago

Hi, the above given URL is working well. In fact the other repo which have similar code like yours is working like a charm as well. I have no idea why it is not working. I modified your code and put logs but did not see any error or something. I tried some other repos which worked well.

Link 1- https://santa.talking-gpt.com/ Code : https://github.com/unconv/santa-gpt/blob/master/static/script.js

This is working fine in Samsung Browser, Chrome for Android and other Webkit Browser too in Android.

tanu360 commented 10 months ago

hey thanks for reporting this problem

are you able to access the devtools? do you see specifically where the problem is happening?

also, does this page work in this browser? https://www.google.com/intl/en/chrome/demos/speech.html

thanks!

Something is seriously wrong as I exactly used google site codes or other repo codes but when I run with your repo, It didn't detect.

No error, just simply It starts audio services and end immediately. There is nothing in between.

gregsadetsky commented 10 months ago

Unfortunately, I don't have ready access to an Android device, so debugging this will be a bit difficult for me.

If you'd be able to send a screenshot of the devtools open when this error happens, that would help tremendously to resolve this. Thank you!

tanu360 commented 10 months ago

Hello @gregsadetsky , I attached my Android Phone via USB debugging and recorded the debugging session using DevTools I am also speaking to it but It literally does not care and just call onend event. I do not see any error because there is no error at all. If It could not hear any voice the It should have have show "no-error" error but It simply just move to onend and that's it.

https://drive.google.com/file/d/131BK3E2C7dOHu8Z4mnFptnfdaFu_gGJC/view

gregsadetsky commented 10 months ago

thanks for recording a debugging session, I really appreciate it! I can't access the file on google drive -- I just requested access. thank you!

tanu360 commented 10 months ago

thanks for recording a debugging session, I really appreciate it! I can't access the file on google drive -- I just requested access. thank you!

Pardon for such a blunder. I forgot to set it public. Now I also allowed you the access and made this public as well.

Chrome : 121.xxx for Android Device : Samsung S23 Ultra

tanu360 commented 10 months ago

thanks for recording a debugging session, I really appreciate it! I can't access the file on google drive -- I just requested access. thank you!

Hi, did you able to access and see the video? I still not able to understand why It is just hitting onend event as soon as we start even though the browser supports SpeechRecognition api well

gregsadetsky commented 10 months ago

hey, thanks for providing the file. unfortunately, it's hard for me to see what's going on -- generally speaking, the web speech api can 'decide' to end the speech recognition session for any numbers of reasons. and it definitely can/will end a session if it doesn't detect any speech - any long enough silence and the engine will shut down

I'm wondering if the number of breakpoints is making it difficult to see whether the engine has actually started or not

if you're able, I'd recommend running the code locally on your machine in order to work with non minified code. this will make it a lot simpler to see what's going on when stepping through the code. probably that the best would also be to create a minimal reproducible example i.e. run only the speech recognition code (extracted from the codebase) on your device to see if you see the problem

sorry that I can't be of more help, as it's very difficult to be doing remote debugging this way - it would be a different story if I had the same device as you do

tanu360 commented 10 months ago

I tried , the code if I port in js and run in mobile browser, It works but the ts build code sadly not. no solution i think so as the problem is weird you may try to run your site in any of your phone. I tried in iphone too. 15 pro max, tried both safari and chrome but did not work in any of those.