thinhlpg / vixtts-demo

A Vietnamese Voice Cloning Text-to-Speech Model ✨
https://huggingface.co/spaces/thinhlpg/vixtts-demo
Mozilla Public License 2.0
316 stars 149 forks source link

help to load vivoice dataset from huggingface #8

Open phvaha1 opened 1 month ago

phvaha1 commented 1 month ago

@thinhlpg Thanks for your meaningful project. Can you share me script to download data from huggingface, then make train data for xtts model? I access the url of dataset but they are parquet files, I do not know how to make training data from those files.

I used below code, but do not know what to do next.

from datasets import load_dataset

ds = load_dataset("capleaf/viVoice")
thinhlpg commented 1 month ago

hello @phvaha1, you can format the dataset like this: image you can check this repo and the documentation for more details https://github.com/daswer123/xtts-finetune-webui https://docs.coqui.ai/en/latest/models/xtts.html

phvaha1 commented 1 month ago

@thinhlpg Thanks you for your response, I can format the data as above format now. But I got different problem below.

When fine tuning xtts model from pretrained checkpoint (the default checkpoint as in document), I need to update the vocab.json file. But after update vocab.json file, I got below error:

size mismatch for gpt.text_embedding.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([7767, 1024])

I think it is because new vocabs size is 7767 but checkpoint has size 6681. Do you know how to fix this?