thinhlpg / vixtts-demo

A Vietnamese Voice Text-to-Speech Model ✨
https://huggingface.co/spaces/thinhlpg/vixtts-demo
Mozilla Public License 2.0
282 stars 134 forks source link

how long did you finetune the model? #6

Closed thivux closed 17 hours ago

thivux commented 1 month ago

Hi @thinhlpg, I'm curious about how many epochs you fine-tuned the model to achieve this performance level

thinhlpg commented 1 month ago

Hi @thivux! Thank you for reaching out! I fine-tuned the model for 5 epochs to achieve this performance level. If you have any more questions or need further details, feel free to ask!

thivux commented 1 month ago

how were the losses after 5 epochs in your experiment :-? i am fine-tuning on my dataset and after 5 epochs the losses still seem to be going down :v

thinhlpg commented 1 month ago

I can't recall the exact loss numbers after 5 epochs, but the loss was still going down. We just decided to stop at 5 epochs for our experiment.

thivux commented 1 month ago

thank you for you reply. i have 2 more questions:

  1. how did you create the tokens for Vietnamese?
  2. i see that your vocab size is 7544, and the original vocab size is 6681, which means 863 Vietnamese tokens were added, why did you choose this vocab size for Vietnamese? why not smaller or bigger :-?
thinhlpg commented 1 month ago
  1. I used this code snippet to train the tokenizer. After training, I merged this new vocabulary with the existing one with a custom python script, resulting in 863 additional Vietnamese tokens. I chose a vocab size of 1000 because it’s relatively large compared to what’s used for other languages, which I hoped would effectively capture the nuances of Vietnamese (though this is a bit of a hypothesis).
from tokenizers import trainers

def get_training_corpus():
    # https://huggingface.co/datasets/ntt123/viet-tts-dataset
    dataset = viet_tts_dataset["train"]
    for start_idx in range(0, len(dataset), 1000):
        samples = dataset[start_idx : start_idx + 1000]
        yield samples["transcription"]

trainer = trainers.BpeTrainer(vocab_size=1000)
tokenizer.train_from_iterator(get_training_corpus(), trainer=trainer)
tokenizer.save("new-tokenizer-file-path.json")
  1. The number 863 represents the new tokens added, excluding the existing ones. I trained the model with a toy dataset to test, and it "worked" surprisingly well. While this number might not be optimized, it was a practical choice given our limited processing power, so we decided to keep this number.
thivux commented 4 weeks ago

thanks a lot!!! :heart: