Open chrbsg opened 1 month ago
@chrbsg Thank you for this comprehensive assessment of shortcomings in the documentation for the text-to-speech client.
@chrbsg I renamed this issue for better SEO regarding the concern that is actionable for this repo. (Well, it's actually a wider problem that can't be solved just for this repo alone, but at least it is actionable for our team.) I moved your original title down into the issue description.
Thanks for the info. I opened issues https://github.com/GoogleCloudPlatform/golang-samples/issues/4426 for the example code and https://issuetracker.google.com/issues/372169457 for the voice limits documentation.
There are other streaming samples there, although I don't know if they provide what you're looking for. (Example: texttospeech/synthesize_speech/audio_profile.go).
There are no examples of streaming synthesis (i.e. client.StreamingSynthesize) as far as I can see, audio_profile.go uses the regular client.SynthesizeSpeech.
@chrbsg Thank you for opening those issues. I think you should move this sentence in golang-samples #4426 to issuetracker 372169457:
Code should document that there is no way to configure audio output. Only LINEAR16 audio with a sample rate of 24000Hz is available.
The engineers working on golang-samples have no ability to fix the product documentation, only the product team can take action on that. (The golang-samples engineers could add comments in the samples with the information you want, but probably won't since those comments could easily fall out of date if/when the product team makes changes.) Sorry to burden you with this fine-grained routing of your very legitimate complaints, but realistically this is the best way to get the message through.
Streaming text-to-speech synthesis needs to be documented properly
The text-to-speech example at https://cloud.google.com/text-to-speech/docs/samples/tts-quickstart contains working Go code, which is good, but it does not include streaming synthesis. Streaming synthesis should be documented there to the same standard as the non-streaming example.
A developer shouldn't have to read and understand automatically generated protobuf source code in order to figure out how to use streaming synthesis. It should be documented properly with working example code. There is example code in this repo (text_to_speech_client_example_test.go) but it is not sufficient to write a working client:
It's not obvious what these requests should be, as it requires using the specific Protobuf types (like
texttospeechpb.StreamingSynthesizeRequest_StreamingConfig
)cloud_tts.pb.go includes some hints:
and
So the audio is always uncompressed 24kHz? This should be documented. And, unlike the non-streaming API, there's no way to configure this to return 48kHz Opus?
So I came up with:
Which worked for sending outgoing requests. But there was a problem - the returned error
Language code en-AU is not currently supported for streaming synthesis.
. Ok so streaming doesn't work with all voices? So which voices/languages are supported? This should be documented. Is "en-US" the only locale that works? How can I find out which voices are supported by streaming?So I tried a "en-US" voice and got the error
Currently, only Journey voices are supported for streaming synthesis.
This should be documented.So to sum up this request: