Closed thivux closed 17 hours ago
Hi @thivux! Thank you for reaching out! I fine-tuned the model for 5 epochs to achieve this performance level. If you have any more questions or need further details, feel free to ask!
how were the losses after 5 epochs in your experiment :-? i am fine-tuning on my dataset and after 5 epochs the losses still seem to be going down :v
I can't recall the exact loss numbers after 5 epochs, but the loss was still going down. We just decided to stop at 5 epochs for our experiment.
thank you for you reply. i have 2 more questions:
from tokenizers import trainers
def get_training_corpus():
# https://huggingface.co/datasets/ntt123/viet-tts-dataset
dataset = viet_tts_dataset["train"]
for start_idx in range(0, len(dataset), 1000):
samples = dataset[start_idx : start_idx + 1000]
yield samples["transcription"]
trainer = trainers.BpeTrainer(vocab_size=1000)
tokenizer.train_from_iterator(get_training_corpus(), trainer=trainer)
tokenizer.save("new-tokenizer-file-path.json")
thanks a lot!!! :heart:
Hi @thinhlpg, I'm curious about how many epochs you fine-tuned the model to achieve this performance level