watson-developer-cloud / go-sdk

:mouse: go SDK for the IBM Watson services.
Apache License 2.0
71 stars 25 forks source link

raw audio does not work through websockets but works through http interface #109

Closed ryan-netizen closed 3 years ago

ryan-netizen commented 3 years ago

Overview I am trying to send a raw audio in a file over the websocket interface by using the go sdk, but gets below error back.

{
   "error": "unable to transcode data stream application/octet-stream -> audio/x-float-array "
}

Expected behavior And the same audio file works if I use http interface instead of websockets. Since it works over http interface then using the websocket interface should not be a problem. So, by using websocket interface it should be able to send me transcript message.

Actual behavior

Recognized audio: :
{}

How to reproduce It can be reproduced by using the default speechtotext example with raw audio file.

Screenshots image

SDK Version

github.com/watson-developer-cloud/go-sdk/v2 v2.0.3

Additional information:

jeff-arn commented 3 years ago

@ryan-netizen thanks for taking the time to file this! We'll look into it and may reach out with requests for additional information once we are able to get into it and debug.

ryan-netizen commented 3 years ago

@repjarms , superb. ! I can provide code snippet and raw sound file if that saves you some time. Anyway, willing to help in anyway I could.

Ryan.

ryan-netizen commented 3 years ago

@repjarms wanted to check if you have any updates for me? Thanks for your time.

ryan-netizen commented 3 years ago

@repjarms any update on this one?

Ryan

jeff-arn commented 3 years ago

Hi @ryan-netizen - we just released the latest version of the SDK. Can you try the code again and see if it works on this version? If not, I'll look into releasing a bugfix version that addresses the issue.

Thanks for your patience!

ryan-netizen commented 3 years ago

Hey @repjarms Sorry for the late reply. So i ran my tests on the latest versions.

github.com/IBM/go-sdk-core/v5 v5.5.1 // indirect
github.com/watson-developer-cloud/go-sdk/v2 v2.1.0 // indirect

But looks like I am getting the same error as before.

image

I am attaching my code snippet along with rawAudio that i was using to help troubleshoot the issue.

package main

import (
    "encoding/json"
    "fmt"
    "os"

    "github.com/IBM/go-sdk-core/v5/core"
    "github.com/watson-developer-cloud/go-sdk/v2/speechtotextv1"
)

func main() {
    // Instantiate the Watson Speech To Text service
    authenticator := &core.IamAuthenticator{
        ApiKey: "YOUR API KEY",
    }
    service, serviceErr := speechtotextv1.
        NewSpeechToTextV1(&speechtotextv1.SpeechToTextV1Options{
            URL:           "YOUR SERVICE URL",
            Authenticator: authenticator,
        })

    // Check successful instantiation
    if serviceErr != nil {
        panic(serviceErr)
    }

    // Open file with mp3 to recognize
    audio, audioErr := os.Open("/opt/audioRaw.raw")
    if audioErr != nil {
        panic(audioErr)
    }
    // callbook can have `OnOpen`, `onData`, `OnClose` and `onError` functions
    callback := myCallBack{}

    recognizeUsingWebsocketOptions := service.
        NewRecognizeUsingWebsocketOptions(audio, "audio/mulaw;rate=8000;channels=1")

    recognizeUsingWebsocketOptions.
        SetModel("en-US_NarrowbandModel").
        SetWordConfidence(true).
        SetSpeakerLabels(true).
        SetTimestamps(true)

    service.RecognizeUsingWebsocket(recognizeUsingWebsocketOptions, callback)
}

type myCallBack struct{}

func (cb myCallBack) OnOpen() {
    fmt.Println("Handshake successful")
}

func (cb myCallBack) OnClose() {
    fmt.Println("Closing connection")
}

func (cb myCallBack) OnData(resp *core.DetailedResponse) {
    var speechResults speechtotextv1.SpeechRecognitionResults
    result := resp.GetResult().([]byte)
    json.Unmarshal(result, &speechResults)
    core.PrettyPrint(speechResults, "Recognized audio: ")
}

func (cb myCallBack) OnError(err error) {
    panic(err)
}

audioRaw.zip

I hope this helps you in figuring out the issue.

Thank you very much for your time.

Ryan

ryan-netizen commented 3 years ago

@repjarms do you have some updates on this?

Please, let me know if you need more information.

ryan-netizen commented 3 years ago

@repjarms does the raw audio file work from the websocket interface?

taimoor99 commented 3 years ago

@repjarms facing same issue

jeff-arn commented 3 years ago

@ryan-netizen @taimoor99 this is something I'm looking into this week. I hadn't been able to check on it since the last release, but I appreciate your patience as we juggle a few different things. I'll update this issue later this week with my findings and a possible fix release.

ryan-netizen commented 3 years ago

@repjarms superb. This is really a good news. Thank you so much.

jeff-arn commented 3 years ago

Just wanted to update this thread to say that this is being actively investigated, and will provide more details in the coming days.

ryan-netizen commented 3 years ago

@repjarms you are really great. Thank you very much. !

nan2iz commented 3 years ago

Hi @ryan-netizen

Just wanted to update on this thread. I have worked with Jeff and we were able to identify the problem. We are working on the fix right now. I will let you know the update again once we release the fix.

ryan-netizen commented 3 years ago

@nan2iz this is super super good. I am really excited to try fix once released. Thank you very much.

jeff-arn commented 3 years ago

@ryan-netizen version 2.2.0 was just released which contains the fixes for this issue. Give the new version a try and let us know if it's working.

jeff-arn commented 3 years ago

Closing this issue as resolved by the new release. If problems persist, feel free to re-open or open a new issue.