Open nift-d opened 2 years ago
Personally I don't think I'd use this, I just begin/end whenever I need to use dictation, although admittedly I have this bound to a single key (push-and-hold to talk) in a way that was a involved to configure.
I assume you're leaving dictation running so I can see how you might want to pause in that case.
A simple pause command could pause the recording process launched by nerd-dictation.
It seems like it might be simpler to figure out ways to make sure begin/end are fast though, so it's not annoying to have to pause, and changing configuration at run-time doesn't complicate the code.
It may be possible to keep the VOSK language model(s) loaded in a way that allows begin/end not to require loading it again each time.
yes, I leave it running. maybe the pause command does not have to stop the recording, but only the output of text? a more responsive begin/end command with a VOSK model that stays loaded would do the same trick for me.
The following might not be useful in your use case, but for people stumbling over this issue, searching for a solution for their use case the following might be useful:
Another possibility could be to mute the microphone or desktop input with pactl set-source-mute 0 toggle
0
means to mute the first source which is in my case my desktop sounds (i use nerd-dictation mainly to transcribe certain parts of my lectures).
You can find out what index your microphone has with pactl list sources
or pactl subscribe
and then if you're muting and unmuting your microphone in pavucontrol
you can find out the right index from e.g. Event 'change' on source #0
.
I hope this might be useful for someone searching for a way to pause the dictation.
There's some fine discussion on vosk-api about these silence-fragments.
We have released new model
https://alphacephei.com/vosk/models/vosk-model-en-us-0.42-gigaspeech.zip
it is about the same accuracy like 0.22, but no "the" issue anymore. Try it for your apps.
Using 0.42-gigaspeech has solved the problem for me.
I tried adding a pause button to .config/nerd-dictation based on the start/end keywords, but this doesn't work, even in --continuous mode, because the beginning of the same text is re-processed and it's not clear to me how this could be fixed
from pynput import keyboard
is_active = False
def on_press(key):
global is_active
if key == keyboard.Key.f9:
is_active = not is_active
print(f"Listening is {is_active}")
listener = keyboard.Listener(on_press=on_press)
listener.start()
def nerd_dictation_process(text):
global is_active
if(is_active):
return text
else:
return ""
I also tried putting the vosk model files in a ramdisk, to speed up loading so that begin
and end
are fast enough for real-time use, but this didn't do the trick.
Great piece of software, thanks to the authors.
Looks like suspend
and resume
subcommands are implemented and working well now.
so far there is "begin", "end" and "cancel" - and it is wonderful. but I sometimes struggle to find words and start mumbling and I do not want that to be transcribed. I just mute the mic now, but that results in fragments which are highly annoying see #26. since this is due to vosk, a nice work around would be a "pause" mode of the input that I can set a key binding to.
maybe it is even possible to change the vosk model during "pause" mode? so one could switch languages..