TTS Finetuning: changes to make

ramiKammoun commented 1 month ago

I wanted to inquire on the changes to make when trying to fine-tune ArTST for the task of TTS.

From what I understood, i need to change these in the finetune.sh file:

DATASET=/name/of/dataset
DATA_ROOT=/TTS/_text/$DATASET
LABEL_DIR=/TTS/_labels/$DATASET
SAVE_DIR=/TTS/_models/$DATASET
TRAIN_SET=train
VALID_SET=valid

where should I be situated when changing the /name/of/dataset?

For the train and valid, to what should they be changed?

And finally, for the data_ROOT folder, the files existing in it, in the test.tsv file, this is an example of a line used in it:

/l/users/hawau.toyin/ArabicDeepFake/DATASETS/wav_files/ch_14_arabic_tts_dataset_98.wav  88066   /l/users/hawau.toyin/ArTST/scripts/DATA_ROOT/CLARTTS_speaker_embedding.npy

The number 88066 corresponds to what exactly?

For the LABEL_DIR, what shall we put?

Theehawau commented 1 month ago

DATASET is helpful if you have prepared a manifest for multiple datasets in the same root directory; you can remove it or set it accordingly.

train,valid corresponds to your split file names e.g train.txt,valid.txt

88066 is the audio duration * 16000, you can obtain this with soundfile.read(audio_name)[0].shape[0]

label_dir is path to folder that has the .txt files

ramiKammoun commented 1 month ago

One last question, does the sample rate have to be 16000 when finetuning? Or could it be for 22050?

Theehawau commented 1 month ago

must be 16000

mbzuai-nlp / ArTST

TTS Finetuning: changes to make #7