Audio encoding - Githubissues

jongfeelkim-VIRNECT / STT-Hololens

Unity translation app using Google Cloud Speech-to-Text on Hololens

MIT License

3 stars 0 forks source link

Audio encoding #2

Closed jongfeelkim-VIRNECT closed 4 years ago

jongfeelkim-VIRNECT commented 4 years ago

Supported audio encoding, not include wav. https://cloud.google.com/speech-to-text/docs/encoding

Use flac or mp3 (optional config) https://cloud.google.com/speech-to-text/docs/quickstart-client-libraries

Convert audio data https://www.magellanic-clouds.com/blocks/en/guide/cloud-speech-api-audio-encoding/

In Unity

Microphone record to wav file format
Wav file convert to flac via SoX

jongfeelkim-VIRNECT commented 4 years ago

Using client library test

Reference of .NET QuickStart
https://github.com/GoogleCloudPlatform/dotnet-docs-samples/blob/master/speech/api/QuickStart/QuickStart.cs
Some edit code
- DEMO_FILE is wav mono file
- Commented RecogniotionConfig.Encoding and SampleRateHertz
It works! (about 80% complete result)

public class QuickStart
    {
        // The name of the local audio file to transcribe
        public static string DEMO_FILE = "../resources/commercial_mono.wav";
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var response = speech.Recognize(new RecognitionConfig()
            {
                //Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
                //SampleRateHertz = 16000,
                LanguageCode = "en",
            }, RecognitionAudio.FromFile(DEMO_FILE));
            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine(alternative.Transcript);
                }
            }
        }
    }

jongfeelkim-VIRNECT commented 4 years ago

So, Audio encoding in Unity plan is...

Microphone record to wav (under 16000 sampling rate) file format
Using SpeechClient and get response!

jongfeelkim-VIRNECT commented 4 years ago

Microphone record start set up 16000 sampling rate. Record to AudioSource.AudioClip and then save file to wav format.

audioSource.clip = Microphone.Start(microphoneDevice, true, 20, 16000);

To save file, use SavWav open source from gist. https://gist.github.com/darktable/2317063

It is convert from AudioClip to wav file.

QuickStart demo test from recorded wav file The saved file voice is "고려은단 비타민C" and translate to korean.

jongfeelkim-VIRNECT commented 4 years ago

Open .wav file and read byte arrays. then convert from byte array to Base64Encoding string. https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionAudio?hl=ko

C# Convert class https://docs.microsoft.com/en-us/dotnet/api/system.convert.tobase64string?view=netframework-4.8

jongfeelkim-VIRNECT commented 4 years ago

Base64Decode method https://github.com/jongfeelkim-VIRNECT/STT-Hololens/blob/2270ee9c1ccf8fe820008bac5449d37a96e879ff/Assets/Scripts/MicrophoneController.cs#L87