scribear / ScribeAR.github.io

Live Transcription for Augmented Reality Glasses
11 stars 16 forks source link

Azure model customization #75

Closed harsh183 closed 2 years ago

harsh183 commented 3 years ago

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-train-model

This can be both general, as well as course specific.

Improve recognition accuracy on industry-specific vocabulary and grammar, like medical terminology or IT jargon

This takes in text input. We can scrape slides/syllabus/textbooks for better transcription of CS specific terms.

Define the phonetic and displayed form of a word or term that has nonstandard pronunciation, like product names or acronyms

This probably won't be too often, but maybe some words like Sequel -> SQL can be fixed.

Improve recognition accuracy on speaking styles, accents, or specific background noises

This is audio+text input. For standard American accents, the baseline models are fine, but for specific professors we can fine tune the model. Generally there might be some captioned classes from previous semester (via DRES or similar) that can be put in as trained datasets.

WilliamFoster3 commented 2 years ago

vague