Training dataset is not well understand in paper, how can i prepare style audio and content audio, then retrain the model? Could you please explain the strategy of the dataset?
You can start by experimenting with the data samples in the "audios" directory. All the audio data has been segmented into 5.12 seconds. Convert them into mel-spectrograms and use them for training and inference.
Hi, @lsfhuihuiff I am trying to run training, but met multiple problems below.
Thanks in advance!