mrdbourke / learn-huggingface

Repo designed to help learn the Hugging Face ecosystem (transformers, datasets, accelerate + more).
http://www.learnhuggingface.com/
Apache License 2.0
43 stars 7 forks source link

Guidance Needed: Fine-tuning Whisper Model with Custom Datasets for Hausa Language (STT & TTS) #1

Open spaco67 opened 2 months ago

spaco67 commented 2 months ago

Hi Bourke,

I'm working on a project that involves using OpenAI's Whisper model, and I'm particularly interested in fine-tuning it with custom datasets for Hausa, a low-resource language. I believe this would greatly benefit communities working with underrepresented languages.

Objectives:

Fine-tune Whisper for the Hausa language: Steps or guidance on how to adapt the model using our own Hausa datasets. Enable both Speech-to-Text (STT) and Text-to-Speech (TTS) functionalities: Making the model versatile for various applications. Best practices for dataset preparation: Tips on formatting and annotating custom Hausa datasets to achieve optimal results.

Questions:

  1. Is there existing documentation or tutorials on fine-tuning Whisper with custom datasets, especially for low-resource languages like Hausa?
  2. Are there any recommended tools or libraries within Hugging Face that can facilitate this process?
  3. Has anyone in the community successfully implemented this for Hausa or similar languages and could share insights or resources?

Additional Context:

Platform: Hugging Face Transformers Model in Use: OpenAI Whisper Language Focus: Hausa

Any guidance, suggestions, or references would be highly appreciated!

Thank you for your support.

fami-sura commented 1 week ago

Hello Bourke,

I'm very happy to see your concern on this matter, while I'm still on searching about this issue, and I'm really blindly looking for this work to be done, I'm studying software engineering, and I wanted to see this been done.

I don't know where are you, and what you're doing, but I'm looking forward discuss this matter with you so that we can join hand together to work on this matter.

I'm looking forward to your reply and you can also contact me on this email: asakrg@outlook.com

Thank you.

fami-sura commented 1 week ago

In addition to that, I also make a lot of research and I found out someone has been started fine-turning the Whisper to Hausa Language and I forked the project, and let's work together on it.