introlab / odas_web

A desktop visualization GUI for the ODAS library
MIT License
137 stars 53 forks source link

Google API Keyfile #52

Closed nacho-bodega closed 3 years ago

nacho-bodega commented 3 years ago

Hello!

I'm trying to use Google Speech Voice Recognition but having trouble setting up.

I created the API key and got the json file from Google Cloud. When I try to specify the path for the json file in "Google API Keyfile" box in Configure, it says "Pleae enter a URL". It prevents me from saving the file path.

Is there any step I need to take after getting the json file from Google? Or it has to place in a certain file location? This is my first time using Google API keyfile so I must have missed some elementary step.

Any advice I will appreciate it.

GodCed commented 3 years ago

Hi, normally when you click in the "Google API Keyfile" text box a file browser dialog should open to allow you to select your key file. Is this dialog somehow missing?

An alternative is to name the file google-api-key.json and place it directly in the odas_web directory and use the default settings.

nacho-bodega commented 3 years ago

Yes, the file browser dialog pops out. So I think it was set as the file path then, even though it says "Please enter a URL"

In the transcript box in the Recording window, when I hover over the WAV file, it says "Couldn't process".

It seems to affect client connections too. It eventually all close. If I re-launch ODAS studio, port 10000(separate) and 10010(postfiltered) get the error to connect. If I turn off google-API-keyfile in the Configure window then it can read all ports normally.

I think my keyfile is not accessing properly somewhere. I tried different computers/networks/OSs to see the differences but the results were the same. I tried your suggestion as well (renamed to google-api-key.json).

I need google-speech-to-text API right? To get the keyfile, I clicked "create key" and chose "JSON" form, and then it was automatically downloaded. Is that the same steps as yours?

Are you using Ubuntu? Any network change do I need to?

Thanks for your help.

GodCed commented 3 years ago

It seems that the speech recognition indeed can't connect, which leads to app blockage, preventing audio sink and closing ODAS disconnecting all the client connections. Do you have the terminal output of ODAS Studio?

I am indeed using Ubuntu, but it has been years since I made this app and I don't remember the steps to get the key. However it is the Google speech to text api.

No network change that I know of, except of course being able to reach Google.

nacho-bodega commented 3 years ago

Yes I see the terminal saying like:

"Recorder 0 started Recorder 0 was false active Recorder 0 ended" (Silent data)

"Registering header on recorder 0 Registered header on recorder 0 Recorder 0 undefined" (Voice data)

The validation error message "Please enter a URL", it seems like it is coming from HTML5 error checker. The audio ports will be interrupted as soon as I hit the save button along with checking "Use Google Speech Voice Recognition".

What do you think about this? How can I add code like this? https://stackoverflow.com/questions/6536381/input-type-url-says-please-enter-a-url-if-http-is-not-included

I'm not good at html/css/javascript at all but I wanna troubleshoot on configure.html section.

Thanks

nacho-bodega commented 3 years ago

"As soon as I hit the save button..." was not true. After save the path and close, the connections disrupt when the recording is on.

On terminal, it says:

"Write stream 0 is full Holding samples..."

Then it loses the connections.

After that, I get the following messages.

Recorder 0 ended Registering header on recorder 0 Registered header on recorder 0 Recorder 0 undefined [ { alternatives: [ [Object] ],     isFinal: false,     stability: 0.009999999776482582,     resultEndTime: { seconds: '0', nanos: 500000000 },     channelTag: 0,     languageCode: '' } ]... so on

GodCed commented 3 years ago

Look at ids configure-form and api-keyfile-input elements in configure.html to experiment with the url thing.

Are you trying to do configuration changes while ODAS is running? I suggest you don't and finish the configuration before starting ODAS. What sample rate are you using? Are you on good hardware with a reliable internet connection? The blockage looks like a performance problem where the app can't process/upload the audio streams fast enough.

nacho-bodega commented 3 years ago

Ok, I will try for url thing.

I'm using 16K fs and my computers are indeed old, optiplex380, and 2010 macs(ubuntu on mac). I tried with ethernet cable but the same issue. My mic-array is matrix-creator with Pi remotely controlled by any of the above systems. Pi is fine running odaslive unless I stop talking for while.

What system do you use or suggest? So that maybe I can try getting similar systems.

GodCed commented 3 years ago

Okay 16K should do it.

If I remember the machine I used while coding this it had a quad core clocked around 2.5 Ghz and 8 go of RAM, but that's only what I have on top of my head. As I said try not changing the configuration while ODAS is running, this is untested use case and may lead to problems.

Other possibility is that Google changed something in their API and it broke the app as it has not been worked on for a while.

nacho-bodega commented 3 years ago

Thanks for the information.

All of my systems have at least 8G RAM and more than 2.5Ghz but I gotta get rid of dual-core macs, they are too old. Google says you have to set env variable path for the API key so I did and also I tried not to change config during the running as you said but the results were the same.

I think the possibility is narrowed down to API system changes and url thing. I will keep troubleshooting and focusing on those.

Thanks

nacho-bodega commented 3 years ago

Quick question. Why you have 4 audio-recorders for 8ch tcp audio data from odaslive? Or do I understand wrong?

It looks like odaslive is sending 8channel of 2bytes data. When you receive the bytes stream data at odas_web side, therefore you slice the data into 2bytes per recorder but it won`t conflict like: recorder[0] mix with ch0 and ch4 recorder[1] mix with ch1 and ch5 and so on?

My recording is working and I don`t notice any strange sound from those WAV files so I just wanna know how the data is coming and what your intention for the number of recorders to slice them.

GodCed commented 3 years ago

@nacho-bodega there is actually two sets of 4 recorders. 4 channels of separated audio are coming on one socket processed by one node process with 4 recorders and 4 channels of post-filtered audio are coming on another socket processed by a separated process with 4 other recorders.

If you look at the code, main.js requires record.js which spawns two node process with recordings.js, but passing a different wav file suffix and port number.

nacho-bodega commented 3 years ago

Ah that's right, I was talking about 'sp' only. I confirmed that "msg_hops_config->nChannels" is actually 4, I thought this was 8ch. I then wonder where the initial 8ch config setting changes into 4ch during the "aobjects_construct(...)" process in main.c.

nacho-bodega commented 3 years ago

N_inactive in sst configuration file set the number of nChannels, which is 4. I guess this is the computation saved method used in the main algorithm.

Thanks, I'm getting clear now

GodCed commented 3 years ago

In the SSL section of the config file, nPots entry specifies the number of potential sound sources that can be tracked by ODAS. Each potential source is either active or inactive. Each source gets its own channel in both the separated and postfiltered audio as well as its own tracker. The corresponding recorders are started and stopped according to the activity state of the sources computed by the tracker.

If I remember right I didn't build ODAS Studio to handle anything else than 4 as an nPots settings, so I advise against changing it if you want to use the GUI.

nacho-bodega commented 3 years ago

I see, indeed you only need a few active audio data. node is a little tricky for me so I'm imitating the server with python right now.

I'm having difficulty converting tcp bytes into WAV, not successful so far. Right now, if I record sound at Pi client-side, then I can receive the WAV file at the local PC (just a WAV file transfer). Or without recording, I can do VoIP like sending straight mic-array sound to the local PC and directly listening to from the PC's audio output.

I'm trying to test "tcp->WAV->google-api" first because I know "WAV->google-api" works. And then I will try "tcp->google->WAV" next. If I understand correctly, it seems like odas_web is doing "tcp->google-api->WAV" so.

GodCed commented 3 years ago

Not quite. odas_web receives audio data trough TCP then it is pushed to Google and saved to WAV file in parallel. Data is saved here and pushed to Google here.

A WAV file is just raw data preceded by a header indicating the data structure (sample rate, bit depth, channel number, etc.). Google gets those informations from a json config passed with the request to the API, then it receive the same data that is written to te body of the WAV file. Note that the WAV file header must be updated when the recording finishes because it contains the file length.

nacho-bodega commented 3 years ago

This is slower but finally I could make "tcp-->WAV-->google" server working in Python, and it is stable for what I need. I may go with Python for now and hook it up with some ML models.

Thanks for your advice.

GodCed commented 3 years ago

Just to clarify, "fuzzy" transcriptions are transcriptions that are processed live, but that the recognition algorithm is still unsure about. It is useful to display a GUI that shows the live transcription then save the non-fuzzy one, but I would suggest you disable it if you want to feed the data to a model. Please note that fuzzy may contain the same portion of speech multiple times.