nivibilla / efficient-vits-finetuning

Finetuning VITS Efficiently
MIT License
31 stars 6 forks source link

Issue with using spanish data #1

Closed Mixomo closed 1 year ago

Mixomo commented 1 year ago

Hi. i am using your colab notebook, and i am getting a state_dict error, i think it may be because i am trying to train a model in spanish. i have already made the proper modifications like changing the english_cleaners2, by basic_cleaners, followed by the modification of the symbols, the base configuration of ljspeech.json and everything else. Is there any possibility of being able to train in another language other than English, at most if with the pretrained LJspeech gives problems, train it without base model?

I'll pass you my copy of the notebook I was working with: https://colab.research.google.com/drive/1_5yOrThRoVDtmr87yPyrttrkeadXRwtq?usp=sharing

Thanks

 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for SynthesizerTrn:     size mismatch for enc_p.emb.weight: copying a param with shape torch.Size([178, 192]) from checkpoint, the shape in current model is torch.Size([148, 192]).
nivibilla commented 1 year ago

Hey, sorry for the late reply. I didnt get any notifications for some reason. So i think I found the issue. Your text cleaners aren't actually preprocessing the data.

See this screenshot image

It should look something like this image

There are a couple other forks training with chinese so i would have a look at them too to see how they adapted it to a different language.

By the way its probably easier for you to fork this repo or the main repo, change the cleaners test them out locally before uploading them.

Hope this helps!

nivibilla commented 1 year ago

Instead of replacing the text cleaners, I would recommend pre processing the dataset locally and have a train.txt.cleaned and valid.txt.cleaned already in the zip file

Also, if it works for you . Please feel free to adapt the code to deal with multiple languages. If it is just a matter of adding another field in the config we can maybe keep multiple text cleaners and choose what preprocessing is needed.

nivibilla commented 1 year ago

This is the code I used to split the data into train and validation

with open('./metadata.txt') as f:
    lines = f.readlines()

from sklearn.model_selection import train_test_split

# Split the data into training and testing sets with an 80/20 split
train_data, test_data = train_test_split(lines, test_size=0.2, random_state=42, shuffle=True)

with open('./train.txt', 'w') as f:
    f.writelines(train_data)

with open('./valid.txt', 'w') as f:
    f.writelines(test_data)
nivibilla commented 1 year ago

also, Im not sure you even need to replace the cleaners, could you try using 'transliteration_cleaners' instead of changing the symbols? I suspect the issue is from updating the symbols. And as spanish is mostly written in english, you might still get good results. Im not sure how to edit the state dict to make it work for a different symbol set.

maybe have a look at this notebook https://colab.research.google.com/drive/1zQXTel8AyqNvnnBLMItbzs-kUv51Dwat?usp=sharing

nivibilla commented 1 year ago

There are some updates on my notebook. Please use the new one

Mixomo commented 1 year ago

Thanks for your detailed answers. Yes, I could have used the transliteration_cleaners, but as in the comments of one of the scripts in the text folder (cleaners.py), it said that you could use the basic_cleaners if you had all the symbols already adapted (which I have), so I decided to use the basics.

Cleaners are transformations that run over the input text at both training and eval time.

Cleaners can be selected by passing a comma-delimited list of cleaner names as the "cleaners"
hyperparameter. Some cleaners are English-specific. You'll typically want to use:
  1. "english_cleaners" for English text
  2. "transliteration_cleaners" for non-English text that can be transliterated to ASCII using
     the Unidecode library (https://pypi.python.org/pypi/Unidecode)
  3. "basic_cleaners" if you do not want to transliterate (in this case, you should also update
     the symbols in symbols.py to match your data).

Anyway, I will follow your indications and try your updated notebook. if possible, i will keep you updated on the results. thanks again.

nivibilla commented 1 year ago

ah i see. Yeah, I haven't looked at the model in detail. I dont really know how it works. Just trying to improve the performance when trying to voice clone

Mixomo commented 1 year ago

well, I update: the error is still present, since it corresponds clearly to the dictionaries, keys, layers and architecture of the pre-trained model, since the number of Spanish symbols exceeds the English one, and since the pre-trained model was trained on an English set, it is logical that it does not match. I would have to try training from scratch, without a pre-trained model, but I have no idea. At least I have managed to repalce all the scripts and arguments to adapt to Spanish, and I have managed to clean up and phonemize the text. So I proceeded to create a fork of your implementation and see if in the future I can figure out how to fix that. https://github.com/Mixomo/efficient-vits-finetuning-Spanish-support-WIP-

nivibilla commented 1 year ago

Yeah makes sense. I found a notebook that finetuned on Spanish data. It might be helpful for you.

https://colab.research.google.com/drive/1zQXTel8AyqNvnnBLMItbzs-kUv51Dwat?usp=sharing

It seems to be using the multi speaker one. I will try to add that model to this repo aswell. I will close this issue for now but please keep me updated on how it goes.