ideasman42 / nerd-dictation

Simple, hackable offline speech to text - using the VOSK-API.
GNU General Public License v3.0
1.3k stars 108 forks source link

Number conversion: "seven twenty two" becomes "7 twenty-two" #95

Open KJ7LNW opened 1 year ago

KJ7LNW commented 1 year ago

When I say "seven twenty two" it becomes "7 twenty-two", but I would expect "722".

I also tried:

ideasman42 commented 1 year ago

I can't redo this:

Other combinations noted are more ambiguous, what is your --numbers-min-value set to?

KJ7LNW commented 1 year ago

Oops, funny, I wrote the min value stuff and forgot it was there. I have --numbers-min-value=3 which explains the issue.

If you see an easy way to make --numbers-min-value realize that 722 is much bigger than 3, then go for it because I'm not quite sure how to hook that in properly. Otherwise, you may close the issue as not a bug.

ideasman42 commented 1 year ago

Could you show the command used to activate the nerd-dictation ? I can't redo the issue even with --numbers-min-value=3 set.

KJ7LNW commented 1 year ago
 ./nerd-dictation begin --numbers-as-digits --numbers-no-suffix --numbers-min-value=3 --suspend-on-start --verbose=1 --simulate-input-tool=DOTOOLC
ideasman42 commented 1 year ago

I still can't redo this even with the exact command. It might be the language model outputs "twenty-two" instead of "twenty two" which confuses the parsing - which assumes spaces.

The readme in the model directory reads:

Accurate universal English model (both for callcenter and wideband)

Based on Appen Kaldi model https://github.com/Appen/UHV-OTS-Speech

Librispeech test-clean WER:  5.69%
Tedlium WER:                 6.05%
Callcenter WER:             29.78%
KJ7LNW commented 1 year ago

That could be. I'm using an unmodified vosk-model-en-us-0.42-gigaspeech model and it does say twenty-two.