jonatasgrosman / huggingsound

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
MIT License
430 stars 42 forks source link

Finetuned model produces empty transcriptions #102

Open zssvaidar opened 2 months ago

zssvaidar commented 2 months ago

What are your recommendations for finetuning on 100 audio files? is 1000 steps is must? After 40 steps with 44khz, 16khz, 8khz audio transciption results are empty(

Is there a way to finetune via GPU with less than 20Gb gpu memory^^&? I am utilizing colab it gives less memory that require huggingsound training as it seams. THank you very much, appreciate your help!

image

image

_