mandnyc / ssml-builder

Apache License 2.0
162 stars 38 forks source link

'5000 characters limit exceeded' Using SSML vs. Text Input: Google Text-to-Speech (TTS) #24

Open adam-hurwitz opened 5 years ago

adam-hurwitz commented 5 years ago

Issue

Following the documentation for Creating Voice Audio Files with Google Cloud Platform's Text-to-Speech API the following error occurs when using Speech Synthesis Markup Language (SSML), versus no error when using the same content formatted as standard text.

This is the error when using SSML which appears to be inaccurate as the SSML's number of characters is well below the 5000 limit at 2979:

Error: 3 INVALID_ARGUMENT: 5000 characters limit exceeded.

Node.js Setup

const Speech = require('ssml-builder');
const textToSpeech = require('@google-cloud/text-to-speech');

...

const client = new textToSpeech.TextToSpeechClient();
const speech = new Speech();

...

Standard Text Input

console.log('Convert Article ' + data.id + ': ' + data.text);

return client.synthesizeSpeech({
        input: { text: data.text},
        voice: {
          languageCode: '[language-code]',
          name: '[language-option]',
        },
        audioConfig: {
          audioEncoding: '[encoding-type]',
          pitch: "[pitch]",
          speakingRate: "[speaking-rate]"
        },
      })

SSML Input

Using the ssml-builder package.

console.log('Convert Article ' + data.id + ': ' + speech.say(data.text).ssml());

return client.synthesizeSpeech({
        input: { ssml: speech.say(data.text).ssml()},
        voice: {
          languageCode: '[language-code]',
          name: '[language-option]',
        },
        audioConfig: {
          audioEncoding: '[encoding-type]',
          pitch: "[pitch]",
          speakingRate: "[speaking-rate]"
        },
      })

Input

Article: Reports Of Bitcoin's Demise Have Been 'Greatly Exaggerated'

Standard Text - Working As Expected

Character count: 2904

The current bitcoin bear market, labeled crypto winter for its debilitating effect on the broader market and industry, has seen more than $700 billion wiped from the total value of all cryptocurrencies so far this year, some 80% of its value since its all-time high.

Bitcoin has seen similar price percentage declines before, however, and has managed to recover from them. Now, researchers from the University of Cambridge Judge Business School have found the bitcoin industry will "likely" bounce back again.

"Statements proclaiming the death of the crypto-asset industry have been made after every global ecosystem bubble," researchers wrote in the second Global Cryptoasset Benchmarking Study. "While it is true that the 2017 bubble was the largest in bitcoin's history, the market capitalization of both bitcoin and the crypto-asset ecosystem still exceeds its January 2017 levels-prior to the start of the bubble.

"The speculation of the death of the market and ecosystem has been greatly exaggerated, and so it seems likely that the future expansion plans of industry participants will, at most, be delayed."

While the bitcoin industry still has many supporters despite the price collapse, others have been quick to brand bitcoin as dead, something that's happened more than 300 times according to the loosely-updated tracking website 99bitcoins.

Elsewhere, bitcoin bulls, such former Goldman Sachs partner and founder of cryptocurrency merchant bank Galaxy Digital Holdings Mike Novogratz, have sobered up since the giddy highs of late 2017.

Researchers also found that millions of new users have entered the ecosystem over the last 12 months, though most are passive -- buying bitcoin or other cryptocurrencies with newly created wallets and then not moving or using them.

Total user accounts at service providers now exceed 139 million with at least 35 million identity-verified users, the latter growing nearly four-fold in 2017 and doubling again in the first three quarters of 2018, according to the report.

Only 38% of all users can be considered active, although definitions and criteria of activity levels vary significantly across service providers.

Meanwhile, the study found that the top six proof-of-work cryptocurrencies (including bitcoin and ethereum) collectively consume between 52 TWh and 111 TWh of electricity per year: the mid-point of the estimate (82 TWh) is the equivalent of the total energy consumed by the entire country of Belgium -- but also constitutes less than 0.01% of the world's global energy production per year.

A "notable" share of the energy consumed by these facilities is supplied by renewable energy sources in regions with excess capacity, the researchers revealed.

The report also found that cryptocurrency mining appears to be less concentrated geographically, in hashing power ownership, and in manufacturer options, than is widely thought.

SSML - Error

Character count: 2979

<speak>The current bitcoin bear market, labeled crypto winter for its debilitating effect on the broader market and industry, has seen more than $700 billion wiped from the total value of all cryptocurrencies so far this year, some 80% of its value since its all-time high.

Bitcoin has seen similar price percentage declines before, however, and has managed to recover from them. Now, researchers from the University of Cambridge Judge Business School have found the bitcoin industry will &quot;likely&quot; bounce back again.

&quot;Statements proclaiming the death of the crypto-asset industry have been made after every global ecosystem bubble,&quot; researchers wrote in the second Global Cryptoasset Benchmarking Study. &quot;While it is true that the 2017 bubble was the largest in bitcoin&apos;s history, the market capitalization of both bitcoin and the crypto-asset ecosystem still exceeds its January 2017 levels-prior to the start of the bubble.

&quot;The speculation of the death of the market and ecosystem has been greatly exaggerated, and so it seems likely that the future expansion plans of industry participants will, at most, be delayed.&quot;

While the bitcoin industry still has many supporters despite the price collapse, others have been quick to brand bitcoin as dead, something that&apos;s happened more than 300 times according to the loosely-updated tracking website 99bitcoins.

Elsewhere, bitcoin bulls, such former Goldman Sachs partner and founder of cryptocurrency merchant bank Galaxy Digital Holdings Mike Novogratz, have sobered up since the giddy highs of late 2017.

Researchers also found that millions of new users have entered the ecosystem over the last 12 months, though most are passive -- buying bitcoin or other cryptocurrencies with newly created wallets and then not moving or using them.

Total user accounts at service providers now exceed 139 million with at least 35 million identity-verified users, the latter growing nearly four-fold in 2017 and doubling again in the first three quarters of 2018, according to the report.

Only 38% of all users can be considered active, although definitions and criteria of activity levels vary significantly across service providers.

Meanwhile, the study found that the top six proof-of-work cryptocurrencies (including bitcoin and ethereum) collectively consume between 52 TWh and 111 TWh of electricity per year: the mid-point of the estimate (82 TWh) is the equivalent of the total energy consumed by the entire country of Belgium -- but also constitutes less than 0.01% of the world&apos;s global energy production per year.

A &quot;notable&quot; share of the energy consumed by these facilities is supplied by renewable energy sources in regions with excess capacity, the researchers revealed.

The report also found that cryptocurrency mining appears to be less concentrated geographically, in hashing power ownership, and in manufacturer options, than is widely thought.</speak>
adam-hurwitz commented 5 years ago

I've documented the solution in a StackOverflow post. The issue was regarding using the same Speech Object twice thus going over the quota.