ylacombe / finetune-hf-vits

Finetune VITS and MMS using HuggingFace's tools
MIT License
124 stars 30 forks source link

KeyError: 'speaker_id' #35

Open AdamBenghoula opened 3 months ago

AdamBenghoula commented 3 months ago

c'est necessaire d'avoir une colonne speaker_id sachant que j'ai pas besoin d'un locuteur specifique ? ma dataset contient juste les audios et leurs transcription

muhammadsaadgondal commented 3 months ago

This is actually for the case if you have more than 1 speaker.

AdamBenghoula commented 3 months ago

he raise me error when i take it empty

muhammadsaadgondal commented 3 months ago

You have to change the code a little as well. You have to comment out where speaker_id is being referenced in the code. I commented out 3 lines and it works fine for me without further modificaiton; 1- model_outputs = model( input_ids=batch["input_ids"], attention_mask=batch["attention_mask"], labels=batch["labels"], labels_attention_mask=batch["labels_attention_mask"],

speaker_id=batch["speaker_id"],

                return_dict=True,
                monotonic_alignment_function=maximum_path,
            )

2-model_outputs_train = model( input_ids=batch["input_ids"], attention_mask=batch["attention_mask"], labels=batch["labels"], labels_attention_mask=batch["labels_attention_mask"],

speaker_id=batch["speaker_id"],

                        return_dict=True,
                        monotonic_alignment_function=maximum_path,
                    )

3- model_outputs_train = model( input_ids=batch["input_ids"], attention_mask=batch["attention_mask"], labels=batch["labels"], labels_attention_mask=batch["labels_attention_mask"],

speaker_id=batch["speaker_id"],

                    return_dict=True,
                    monotonic_alignment_function=maximum_path,
                )
gangagyatso4364 commented 1 month ago

If we comment the speaker_id=batch['speaker_id'] the fine tuning works but will the model be training on single speaker or multi speaker? I want to train on multiple speakers.