zkmkarlsruhe / language-identification

Spoken Language Identification on Common Voice and AudioSet using Deep Learning
Other
36 stars 7 forks source link

Model Training #2

Open prothej227 opened 2 years ago

prothej227 commented 2 years ago

Hi! Can you add detailed steps on how to train your model using a custom dataset?

danomatika commented 2 years ago

If you need more info that what is in the README, @bytosaur can answer but he is currently on vacation, so it may be a week or so until he can respond.

prothej227 commented 2 years ago

Hi, thanks for your reply! I'm planning to train your model using a custom dataset which is different from the common voice dataset provided in the documentation. Can you elaborate or give specific beginner-friendly steps on how I can retrain your model using my collated dataset?

bytosaur commented 2 years ago

hey @prothej227,

how does your dataset look like? Maybe it is not that different from my setup. You can always try the setup with an incomplete common voice dataset, i.e. two languages that have very few samples.

Collecting noise data is optional. The first step is to process the downloaded common voice folders into a structure that is understandable for the training script. There are a couple of tricks I did to clean the data (voice activity detection, debiasing through sampling) which are more advanced. However, in the end you want to have folders named by the class (language) containing mono samples of equal length, sampled at the same frequency, normalized, etc.. see this section.

Please let me know the sections of the README that are not understandable so I can improve them.

prothej227 commented 2 years ago

I have a dataset that contains wav files that vary in length (max = 5 seconds, min = 3 seconds).