Open Shaobo-Z opened 1 year ago
No, you just have to prepare the English dataset
This is how my dataset looks like ⬇
And this is what I got ⬇. There is changes with train_loss, train_lr,..... However, the train_wer is always 1.0000.
Checked:
librosa.get_samplerate
. I got 16000. I tried multiple ways. However, the result remains the same. Any ideas? Plz.
I can see that your dataset is relatively small, so the number of update steps per epoch is only 5. Have your try a longer run and check if the behavior remains. Take a look at the vocab.json file whether it contains the correct English characters.
Encountered the same problem even with larger dataset (91 steps and 20 epochs).
I have not tried on other language datasets yet. Can you share more information about your dataset, config, tensorboard,…
Python 3.8 Pip install all in requirements.txt, with exception of torch 1.7.1 i had to use (conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch) because I have CUDA 11.4 I tried both vivos dataset and common voice dataset, store them in .txt with panda seperatated by "|" and 2 column: path (path on server) and transcript (encoded utf-8) When I tried to print the pred and label, i got these
Audio are already pre-processed to be 16000 sampling rate and .wav format
i can see that your model did not converge yet, train loss is still high. Try increase the lr higher for faster training
Ping me at mail khanhld218@gmail.com for better debugging since I rarely check the GitHub notifications
Ping me at mail khanhld218@gmail.com for better debugging since I rarely check the GitHub notifications
Already, thanks
Is it possible to get an update on this question? What is the minimum size of the dataset? I want to train the model with a 20mins dataset. Do you think it is possible?
From: ghosthunterk @.> Sent: Friday, July 21, 2023 5:33:53 PM To: khanld/ASR-Wav2vec-Finetune @.> Cc: Shaobo-Z @.>; Author @.> Subject: Re: [khanld/ASR-Wav2vec-Finetune] Can I use an English dataset for this repo? (Issue #7)
Ping me at mail @.**@.> for better debugging since I rarely check the GitHub notifications
Already, thanks
— Reply to this email directly, view it on GitHubhttps://github.com/khanld/ASR-Wav2vec-Finetune/issues/7#issuecomment-1645119163, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJHDBZHB2CRHQRPSO6CQOU3XRIWGDANCNFSM6AAAAAAZ3HZPXY. You are receiving this because you authored the thread.Message ID: @.***>
I will take a look at my codes and run some experiments on english datasets and response to you soon @Shaobo-Z
So after having experimented a while, I found that increasing the learning rate (about >1e-5) and set the scheduler max learning rate to >=1e-4 helped the model to actually learn after a while, just be patient.
In the source code, you used Vietnamese for training and validation. If I want to fine-tune a model that is in English and has English dataset, is there anything that I should change?