ylacombe / finetune-hf-vits

Finetune VITS and MMS using HuggingFace's tools
MIT License
117 stars 25 forks source link

mms dont use pre_processor it uses toekizer so i how can i solve this? #43

Open C0deXG opened 20 hours ago

C0deXG commented 20 hours ago

Still in colab when i run this cell as suggated by @atulpokharel-gp : `from transformers import AutoTokenizer, AutoModelForTextToWaveform

tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-eng") model = AutoModelForTextToWaveform.from_pretrained("facebook/mms-tts-eng")`

and then i run: !accelerate launch run_vits_finetuning.py /content/finetune-hf-vits/training_config_examples/finetune_mms_kor.json

i get this error: OSError: facebook/mms-tts-som does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/facebook/mms-tts-som/tree/main' for available files.

as we know TTS of mms model use uses a tokenizer to pre-process the text inputs to token ids. It doesn't need a preprocessor_config.json how can i do this my config file is:

` { "project_name": "so-finetuning", "push_to_hub": true, "hub_model_id": "facebook/mms-tts-som", "report_to": ["wandb"], "overwrite_output_dir": true, "output_dir": "./tmp/vits_finetuned_Som",

"dataset_name": "khederwaaOne/s",
"audio_column_name": "audio", "text_column_name":"transcribe", "train_split_name": "train", "eval_split_name": "train",

"full_generation_sample_text": "Labadaas qaybood oo aanu soo sheegnay haddii la falanqeeyo",

"max_duration_in_seconds": 28.5, "min_duration_in_seconds": 2.4,

"max_tokens_length": 300,

"model_name_or_path": "facebook/mms-tts-som",

"preprocessing_num_workers": 4,

"do_train": true, "num_train_epochs": 30, "gradient_accumulation_steps": 1, "gradient_checkpointing": false, "per_device_train_batch_size": 4, "learning_rate": 2e-5, "adam_beta1": 0.8, "adam_beta2": 0.99, "warmup_ratio": 0.01, "group_by_length": false,

"do_eval": true, "eval_steps": 50, "per_device_eval_batch_size": 4, "max_eval_samples": 25, "do_step_schedule_per_epoch": true,

"weight_disc": 3, "weight_fmaps": 1, "weight_gen": 1, "weight_kl": 1.5, "weight_duration": 1, "weight_mel": 35,

"fp16": true, "seed": 456 } `

atulpokharel-gp commented 3 hours ago

@C0deXG
create a file name : preprocessor_config.json with :

{ "feature_extractor_type": "VitsFeatureExtractor", "feature_size": 80, "hop_length": 256, "max_wav_value": 32768.0, "n_fft": 1024, "padding_side": "right", "padding_value": 0.0, "return_attention_mask": false, "sampling_rate": 16000 }

inside the model like like this : ! Screenshot from 2024-10-28 09-22-44