hetpandya / youtube_tts_data_generator

A python library to generate speech dataset from Youtube videos
Apache License 2.0
35 stars 8 forks source link
dataset-generator python-library speech-dataset text-to-speech text-to-speech-dataset tts tts-dataset youtube youtube-dataset youtube-dataset-generator

Youtube Speech Data Generator

License Code style: black

Downloads

A python library to generate speech dataset. Youtube Speech Data Generator also takes care of almost all your speech data preprocessing needed to build a speech dataset along with their transcriptions making sure it follows a directory structure followed by most of the text-to-speech architectures.

Installation

Make sure ffmpeg is installed and is set to the system path.

$ pip install youtube-tts-data-generator

Minimal start for creating the dataset

from youtube_tts_data_generator import YTSpeechDataGenerator

# First create a YTSpeechDataGenerator instance:

generator = YTSpeechDataGenerator(dataset_name='elon')

# Now create a '.txt' file that contains a list of YouTube videos that contains speeches.
# NOTE - Make sure you choose videos with subtitles.

generator.prepare_dataset('links.txt')
# The above will take care about creating your dataset, creating a metadata file and trimming silence from the audios.

Usage

Final dataset structure

Once the dataset has been created, the structure under 'your_dataset' directory should look like:

your_dataset
├───txts
│   ├───your_dataset1.txt
│   └───your_dataset2.txt
├───wavs
│    ├───your_dataset1.wav
│    └───your_dataset2.wav
└───metadata.csv/alignment.json

NOTE - audio.py is highly based on Real Time Voice Cloning

References

SRT to JSON

Read more about the library here