ideasman42 / nerd-dictation

Simple, hackable offline speech to text - using the VOSK-API.
GNU General Public License v3.0
1.32k stars 108 forks source link

pause-option would be nice #39

Open nift-d opened 2 years ago

nift-d commented 2 years ago

so far there is "begin", "end" and "cancel" - and it is wonderful. but I sometimes struggle to find words and start mumbling and I do not want that to be transcribed. I just mute the mic now, but that results in fragments which are highly annoying see #26. since this is due to vosk, a nice work around would be a "pause" mode of the input that I can set a key binding to.

maybe it is even possible to change the vosk model during "pause" mode? so one could switch languages..

ideasman42 commented 2 years ago

Personally I don't think I'd use this, I just begin/end whenever I need to use dictation, although admittedly I have this bound to a single key (push-and-hold to talk) in a way that was a involved to configure.

I assume you're leaving dictation running so I can see how you might want to pause in that case.

A simple pause command could pause the recording process launched by nerd-dictation.

It seems like it might be simpler to figure out ways to make sure begin/end are fast though, so it's not annoying to have to pause, and changing configuration at run-time doesn't complicate the code.

It may be possible to keep the VOSK language model(s) loaded in a way that allows begin/end not to require loading it again each time.

nift-d commented 2 years ago

yes, I leave it running. maybe the pause command does not have to stop the recording, but only the output of text? a more responsive begin/end command with a VOSK model that stays loaded would do the same trick for me.

areotwister commented 2 years ago

The following might not be useful in your use case, but for people stumbling over this issue, searching for a solution for their use case the following might be useful:

Another possibility could be to mute the microphone or desktop input with pactl set-source-mute 0 toggle

0 means to mute the first source which is in my case my desktop sounds (i use nerd-dictation mainly to transcribe certain parts of my lectures). You can find out what index your microphone has with pactl list sources or pactl subscribe and then if you're muting and unmuting your microphone in pavucontrol you can find out the right index from e.g. Event 'change' on source #0.

I hope this might be useful for someone searching for a way to pause the dictation.

0xDBFB7 commented 2 years ago

There's some fine discussion on vosk-api about these silence-fragments.

We have released new model

https://alphacephei.com/vosk/models/vosk-model-en-us-0.42-gigaspeech.zip

it is about the same accuracy like 0.22, but no "the" issue anymore. Try it for your apps.

Using 0.42-gigaspeech has solved the problem for me.

I tried adding a pause button to .config/nerd-dictation based on the start/end keywords, but this doesn't work, even in --continuous mode, because the beginning of the same text is re-processed and it's not clear to me how this could be fixed

from pynput import keyboard                                                                                                                                                                                        

is_active = False                                                                                                                                                                                                  

def on_press(key):                                                                                                                                                                                                 
    global is_active                                                                                                                                                                                               
    if key == keyboard.Key.f9:                                                                                                                                                                                     
        is_active = not is_active                                                                                                                                                                                  
        print(f"Listening is {is_active}")                                                                                                                                                                         

listener = keyboard.Listener(on_press=on_press)                                                                                                                                                                    
listener.start()                                                                                                                                                                                                   

def nerd_dictation_process(text):                                                                                                                                                                                  
    global is_active                                                                                                                                                                                               

    if(is_active):                                                                                                                                                                                                 
        return text                                                                                                                                                                                                
    else:                                                                                                                                                                                                          
        return ""                       

I also tried putting the vosk model files in a ramdisk, to speed up loading so that begin and end are fast enough for real-time use, but this didn't do the trick.

Great piece of software, thanks to the authors.

jtara1 commented 10 months ago

Looks like suspend and resume subcommands are implemented and working well now.