Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Problem Overview
I'd like to train a TTA model on the data. But have trouble in data processing.
Expected Outcome
A script for constructing triplet training data (instruction, input audio, output audio)?