googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.8k stars 1.31k forks source link

texttospeech: snippets have TODOs instead of actual code #10955

Open chrbsg opened 1 month ago

chrbsg commented 1 month ago

Streaming text-to-speech synthesis needs to be documented properly

  1. The text-to-speech example at https://cloud.google.com/text-to-speech/docs/samples/tts-quickstart contains working Go code, which is good, but it does not include streaming synthesis. Streaming synthesis should be documented there to the same standard as the non-streaming example.

  2. A developer shouldn't have to read and understand automatically generated protobuf source code in order to figure out how to use streaming synthesis. It should be documented properly with working example code. There is example code in this repo (text_to_speech_client_example_test.go) but it is not sufficient to write a working client:

    • ExampleClient_StreamingSynthesize has a TODO instead of actual code:
                reqs := []*texttospeechpb.StreamingSynthesizeRequest{
                        // TODO: Create requests.
                }

    It's not obvious what these requests should be, as it requires using the specific Protobuf types (like texttospeechpb.StreamingSynthesizeRequest_StreamingConfig)

    cloud_tts.pb.go includes some hints:

// Request message for the `StreamingSynthesize` method. Multiple
// `StreamingSynthesizeRequest` messages are sent in one call.
// The first message must contain a `streaming_config` that
// fully specifies the request configuration and must not contain `input`. All
// subsequent messages must only have `input` set.
type StreamingSynthesizeRequest struct {
        state         protoimpl.MessageState
        sizeCache     protoimpl.SizeCache
        unknownFields protoimpl.UnknownFields

        // The request to be sent, either a StreamingSynthesizeConfig or
        // StreamingSynthesisInput.
        //
        // Types that are assignable to StreamingRequest:
        //
        //      *StreamingSynthesizeRequest_StreamingConfig
        //      *StreamingSynthesizeRequest_Input
        StreamingRequest isStreamingSynthesizeRequest_StreamingRequest `protobuf_oneof:"streaming_request"`

and

// `StreamingSynthesizeResponse` is the only message returned to the
// client by `StreamingSynthesize` method. A series of zero or more
// `StreamingSynthesizeResponse` messages are streamed back to the client.
type StreamingSynthesizeResponse struct {
        state         protoimpl.MessageState
        sizeCache     protoimpl.SizeCache
        unknownFields protoimpl.UnknownFields

        // The audio data bytes encoded as specified in the request. This is
        // headerless LINEAR16 audio with a sample rate of 24000.
        AudioContent []byte `protobuf:"bytes,1,opt,name=audio_content,json=audioContent,proto3" json:"audio_content,omitempty"`
}

So the audio is always uncompressed 24kHz? This should be documented. And, unlike the non-streaming API, there's no way to configure this to return 48kHz Opus?

So I came up with:

        req := &texttospeechpb.StreamingSynthesizeRequest{                             
                StreamingRequest: &texttospeechpb.StreamingSynthesizeRequest_StreamingConfig{           
                        StreamingConfig: &texttospeechpb.StreamingSynthesizeConfig{    
                                Voice: &texttospeechpb.VoiceSelectionParams{
                                        LanguageCode: lang,
                                        Name:         ttsVoice,                        
                                },      
                        },      
                },      
        }       
        req2 := &texttospeechpb.StreamingSynthesizeRequest{                            
                StreamingRequest: &texttospeechpb.StreamingSynthesizeRequest_Input{    
                        Input: &texttospeechpb.StreamingSynthesisInput{
                                InputSource: &texttospeechpb.StreamingSynthesisInput_Text{                              
                                        Text: "hello world",
                                },                                                     
                        },              
                },              
        }   

Which worked for sending outgoing requests. But there was a problem - the returned error Language code en-AU is not currently supported for streaming synthesis.. Ok so streaming doesn't work with all voices? So which voices/languages are supported? This should be documented. Is "en-US" the only locale that works? How can I find out which voices are supported by streaming?

So I tried a "en-US" voice and got the error Currently, only Journey voices are supported for streaming synthesis. This should be documented.

So to sum up this request:

quartzmo commented 1 month ago

@chrbsg Thank you for this comprehensive assessment of shortcomings in the documentation for the text-to-speech client.

  1. For the example linked in 1. above, can you open an issue on https://github.com/GoogleCloudPlatform/golang-samples? I am assuming you found that repo, which is linked from each Go sample on cloud.google.com. There are other streaming samples there, although I don't know if they provide what you're looking for. (Example: texttospeech/synthesize_speech/audio_profile.go).
  2. This issue should serve to track some of 2. above. This repo is the right place to report your observation about the TODOs and missing code. This is a shortcoming of the automation used to produce the samples.
  3. Regarding the request parameter documentation ("Document that only en-US language is supported", etc.) the client libraries in this repo and their documentation are based on interface descriptions published by product teams. We are only able to assist with issues that pertain to the behavior of the libraries themselves. Because the issue you're experiencing is due to the design of the product API, the best way to report these issues is to visit the support page to reach product engineers. To see if this feature has already been requested, you can check the Cloud issue trackers at https://cloud.google.com/support/docs/issue-trackers.
quartzmo commented 1 month ago

@chrbsg I renamed this issue for better SEO regarding the concern that is actionable for this repo. (Well, it's actually a wider problem that can't be solved just for this repo alone, but at least it is actionable for our team.) I moved your original title down into the issue description.

chrbsg commented 1 month ago

Thanks for the info. I opened issues https://github.com/GoogleCloudPlatform/golang-samples/issues/4426 for the example code and https://issuetracker.google.com/issues/372169457 for the voice limits documentation.

There are other streaming samples there, although I don't know if they provide what you're looking for. (Example: texttospeech/synthesize_speech/audio_profile.go).

There are no examples of streaming synthesis (i.e. client.StreamingSynthesize) as far as I can see, audio_profile.go uses the regular client.SynthesizeSpeech.

quartzmo commented 1 month ago

@chrbsg Thank you for opening those issues. I think you should move this sentence in golang-samples #4426 to issuetracker 372169457:

Code should document that there is no way to configure audio output. Only LINEAR16 audio with a sample rate of 24000Hz is available.

The engineers working on golang-samples have no ability to fix the product documentation, only the product team can take action on that. (The golang-samples engineers could add comments in the samples with the information you want, but probably won't since those comments could easily fall out of date if/when the product team makes changes.) Sorry to burden you with this fine-grained routing of your very legitimate complaints, but realistically this is the best way to get the message through.