kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
13.95k stars 5.29k forks source link

Seeking Guidance on Custom Urdu ASR Training Data and Vocabulary Expansion #4900

Open Shaukataliii opened 6 months ago

Shaukataliii commented 6 months ago

Hello, I am a developer working on a project involving the development of an Urdu Automatic Speech Recognition (ASR) system using the Kaldi ASR toolkit. I am encountering two specific challenges and would greatly appreciate your insights.

Challenges

  1. Acquiring Transcriptions for Custom Urdu Dataset:

    • Issue: Obtaining accurate transcriptions for a substantial custom Urdu language dataset, tailored for industry-specific use, has proven challenging.

    • Request: Seeking guidance or suggestions on cost-effective solutions or resources that could assist in obtaining accurate transcriptions.

  2. Optimizing Kaldi ASR for Recognizing Unseen Words:

    • Issue: We aim to optimize the Kaldi ASR model to efficiently recognize new words it may encounter during inference, especially industry-specific jargon.
    • Request: Looking for insights or recommendations on approaches to handle previously unseen words and enhance the model's adaptability.

Thank you for your time and consideration.