ken107 / read-aloud

An awesome browser extension that reads aloud webpage content with one click
https://readaloud.app
MIT License
1.31k stars 226 forks source link

No listed options for Google Studio voices #304

Closed DavidMetcalfe closed 1 year ago

DavidMetcalfe commented 1 year ago

It appears a new option has been added to Google's premium Text-To-Speech voices called Studio voices.

Per Google's description: "The Text-to-Speech API provides Studio voices. This voice type is designed specifically for use with long-form texts such as narration, news reading, and so on."

Please add support for these as they seem to be right in the area of what Read Aloud is best suited for.

ken107 commented 1 year ago

We're hiding these voices because they're quite expensive (160 USD per million characters). These are really intended for commercial purposes, for now.

We're waiting for AI-based text-to-speech technology to become more ubiquitous and more affordable. That day should come very soon, so please stay tuned.

DavidMetcalfe commented 1 year ago

$160 USD per million bytes, not characters, and you still get 100K bytes free each month. I've been using Neural2 voices for months and haven't paid anything due to the free tier. Studio voices are, per Google's own words, "designed specifically for use with long-form texts such as narration, news reading, and so on." Nothing about commercial purposes, but precisely the use case that Read Aloud serves for me.

I don't think there's a good argument against supporting this if the end user wishes to use it. It's their own API key and their own wallet. Just add an additional warning for the user about the potential cost if you're worried about folks running up a bill.

franciscoabenza commented 8 months ago

I want them to be there too. So much that I ask chatgpt to implement them I am too lazy to PR though

    assert(text && voice);
    var voiceName, endpoint;
    if (voice.type === "Studio") {
        // For Studio voices
        voiceName = voice.lang + "-Studio-" + voice.gender;
        endpoint = "texttospeech.googleapis.com"; // Assuming Studio voices use this endpoint
    } else {
        // For other voices (existing logic)
        var matches = voice.voiceName.match(/^Google(\w+) .* \((\w+)\)$/);
        voiceName = voice.lang + "-" + matches[1] + "-" + matches[2][0];
        endpoint = matches[1] == "Neural2" ? "us-central1-texttospeech.googleapis.com" : "texttospeech.googleapis.com";
    }

    return getSettings(["gcpCreds", "gcpToken"])
      .then(function(settings) {
        var postData = {
          input: {
            text: text
          },
          voice: {
            languageCode: voice.lang,
            name: voiceName
          },
          audioConfig: {
            audioEncoding: "OGG_OPUS",
            pitch: ((pitch || 1) -1) *20
          }
        }
        // Rest of the function remains the same...
    });
}