Disclaimer: This is not an official Google product.
Voice Builder is an opensource text-to-speech (TTS) voice building tool that focuses on simplicity, flexibility, and collaboration. Our tool allows anyone with basic computer skills to run voice training experiments and listen to the resulting synthesized voice.
We hope that this tool will reduce the barrier for creating new voices and accelerate TTS research, by making experimentation faster and interdisciplinary collaboration easier. We believe that our tool can help improve TTS research, especially for low-resourced languages, where more experimentations are often needed to get the most out of the limited data.
Publication - https://ai.google/research/pubs/pub46977
Create a project on Google Cloud Platform (GCP).
If you don't have an account yet, please create one for yourself.
Enable billing and request more quota for your project
Install Docker
Go to firebase.com and import the project to firebase platform
If you don't have an account yet, please create one for yourself.
Install gcloud cmd line tool by installing Cloud SDK
Install Node.js
Install firebase cmd line tool
Enable all the following GCP services:
Use this url to enable them all at once.
Usually, it would take a few minutes to enable APIs and GCP will bring you to another page to set credentials for these. Please just skip and close the page as we don't need any new credential setting.
[Optional] Setup your own custom data exporter
If you have not completed all prerequisites, please do so before going further in the following steps.
Clone this project to your current directory by:
git clone https://github.com/google/voice-builder.git && cd voice-builder
If you haven't logged in to your account via gcloud yet, please log in by:
gcloud auth login
Also, if you haven't logged in to your account via firebase, please log in by:
firebase login --no-localhost
Open deploy.sh
and edit the following variables:
Create GCS buckets for Voice Builder to store each job data
./deploy.sh initial_setup
Deploy cloud functions component
./deploy.sh cloud_functions
Deploy ui component
./deploy.sh ui create
After the deployment, you should get an IP that you can access from command line's result (EXTERNAL_IP). You can access your instance of Voice Builder by visiting http://EXTERNAL_IP:3389 in your browser.
At this step, you should have all components in place and can access the UI at http://EXTERNAL_IP:3389. VoiceBuilder initially provides you with two example TTS engines (Festival and Merlin) and public data from language resources repo.
You can test if everything is now working correctly by creating a new voice yourself using our provided Festival engine by:
Data Exporter is another additional component you can add to the system. Normally, Voice Builder can work without Data Exporter. Without it, Voice Builder would just use the input files as they are.
However, in some cases you want to apply some conversion to your input files before feeding them into TTS algorithms. For example:
Voice Builder gives you the flexibility to add your own data exporter which you can use to manipulate data before running the actual TTS algorithm. Your custom data exporter will get a Voice Specification containing file location, chosen TTS algorithm, tuning parameters, etc. You can use these information to manipulate/convert your data. In the end, your data exporter should put all necessary files into the designated job folder to trigger the actual TTS algorithm to run.
Firstly, you need to give your data exporter access to GCS buckets.
Open /deploy.sh and edit the following variables:
Run command to give DATA_EXPORTER_SERVICE_ACCOUNT an ACL access to GCS buckets
./deploy.sh acl_for_data_exporter
Secondly, you need to set your data exporter's url in config.js so that Voice Builder knows where to send Voice Specification information to.
Open /config.js and add DATA_EXPORTER_API to the config as follows:
DATA_EXPORTER_API: {
BASE_URL: '<DATA_EXPORTER_URL>',
API_KEY: '<DATA_EXPORTER_API_KEY>',
}
where BASE_URL is your data exporter url and API_KEY is the api key of your data exporter.
Redeploy Voice Builder UI instance so that it now has a new config and knows where to send Voice Specification info. to your data exporter
./deploy.sh ui update
Try to create a new job! Voice Builder should now send a request to your DATA_EXPORTER_URL with the created job's Voice Specification.
VoiceBuildingSpecification
is a JSON definition of the voice specification. This specification is created by the Voice Builder backend when a user triggers a voice building request from the UI. It can be used by the data exporter (passed to the data exporter via its API) to convert files and by the TTS engine for its training parameters.
{
"id": int,
"voice_name": string,
"created_by": string,
"job_folder": string,
"lexicon_path": object(Path),
"phonology_path": object(Path),
"wavs_path": object(Path),
"wavs_info_path": object(Path),
"sample_rate": int,
"tts_engine": string,
"engine_params": [object(EngineParam)],
}
Fields | Description |
---|---|
id | Unique global job id. |
voice_name | User friendly voice name (e.g. multi speaker voice). |
created_by | The name of the user who created the voice. |
job_folder | The path to the GCS job folder. This is where all the data related to the job is store. |
lexicon_path | Path to the lexicon. |
phonology_path | Path to the phonology. |
wavs_path | Path to the wavs (should be a tar file). |
wavs_info_path | Path to the file containing mapping of wav name and prompts. |
sample_rate | Sample rate at which the voice should be built. |
tts_engine | Type of TTS engine to train the voice. The value for this would be the engine_id from the selected TTS engine engine.json. |
engine_params | The additional parameters for tts engine. |
EngineParam
contains a parameter for TTS Backend engine.
{
"key": string,
"value": string
}
Fields | Description |
---|---|
key | Parameter key. |
value | Value for the parameter key. |
Path
contains information about the file path.
{
"path": string
"file_type": string
}
Fields | Description |
---|---|
path | Path to the file. |
file_type | Format of the file. |
For example, if you set up your data exporter, when you create a voice
using our predefined Festival engine, Voice Builder will send the request
body similar to below to your data exporter. Your data exporter then have
to pre-process data and put them in job_folder
location
(which is gs://your-voice-builder-jobs/1
in this example).
After all necessary files are placed in the folder, the actual voice building
process will begin automatically as expected.
{
"id": 1,
"voice_name": "my_voice",
"createdBy": "someone@somemail.com",
“job_folder”: "gs://your-voice-builder-jobs/1";
"engine_params": [
{
"key": "param_for_festival1",
"value": "50"
},
{
"key": "param_for_festival2",
"value": "30"
}
],
"sample_rate": "22050",
"tts_engine": "festival",
"lexicon_path": {
"path": "gs://voice-builder-public-data/examples/sinhala/lexicon.scm",
"file_type": "SCM"
},
"phonology_path": {
"path": "gs://voice-builder-public-data/examples/sinhala/phonology.json",
"file_type": "JSON_EXTERNAL_PHONOLOGY"
},
"wavs_path": {
"path": "gs://voice-builder-public-data/examples/sinhala/wavs.tar.gz",
"file_type": "TAR"
},
"wavs_info_path": {
"path": "gs://voice-builder-public-data/examples/sinhala/txt.done.data",
"file_type": "LINE_INDEX"
},
}