linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
1.87k stars 150 forks source link

Can I get phoneme from the word and score for its confidence ? #113

Closed tiennguyen12g closed 10 months ago

tiennguyen12g commented 1 year ago

Hello everyone, if you have experienced in this case, please let me know. Thank you.

Jeronymous commented 10 months ago

No, Whisper models do not have any notion of phonemes. They are end-to-end models that goes directly from the audio signal to subwords token (so letters). Having phoneme would require another extra models dedicated to this.