ylacombe / finetune-hf-vits

Finetune VITS and MMS using HuggingFace's tools
MIT License
115 stars 25 forks source link

How to fine tune with Audio Sequence Dataset #14

Open allandclive opened 7 months ago

allandclive commented 7 months ago

https://huggingface.co/datasets/Sunbird/salt-studio-lug

how do I load & fine tune using this dataset

ylacombe commented 7 months ago

Hi Allan, the easiest way to do this for me is to do something like that (I haven't tested the code but it should work). It's basically converting the audio column to something that datasets understand

from datasets import load_dataset, Audio

dataset = load_dataset("Sunbird/salt-studio-lug")

dataset = dataset.map(lambda s: {"audio": s[0], "sampling_rate": s[1]}, input_columns=["audio", "sample_rate")
dataset = dataset.cast_column("audio", Audio())

dataset.push_to_hub(THE DATASET NAME YOU WANT)

Then you can use the newly created dataset as indicated in the README

allandclive commented 7 months ago

Let me give it a try

allandclive commented 7 months ago

Error

TypeError: Couldn't cast array of type list<item: float> to struct<bytes: binary, path: string>

atulpokharel001 commented 7 months ago

is there is any way to fine tune this model with https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1 this one dataset have any one have experience of doing it ?