ideasman42 / nerd-dictation

Simple, hackable offline speech to text - using the VOSK-API.
GNU General Public License v3.0
1.21k stars 104 forks source link

Ignore unicode error within Vosk #91

Open KJ7LNW opened 1 year ago

KJ7LNW commented 1 year ago

During dictation, Vosk returned the error below. The cause is unclear, and I cannot reproduce it, but it is a simple solution to ignore this type of decoding error with a warning.

Traceback (most recent call last):
  File "./nerd-dictation", line 1962, in <module>
    main()
  File "./nerd-dictation", line 1958, in main
    args.func(args)
  File "./nerd-dictation", line 1845, in <lambda>
    vosk_grammar_file=args.vosk_grammar_file,
  File "./nerd-dictation", line 1440, in main_begin
    vosk_grammar_file=vosk_grammar_file,
  File "./nerd-dictation", line 1215, in text_from_vosk_pipe
    json_text = rec_handle_fn_wrapper_from_final_result()
  File "./nerd-dictation", line 1054, in rec_handle_fn_wrapper_from_final_result
    json_text = rec.FinalResult()
  File "/usr/src/nerd-dictation/lib64/python3.6/site-packages/vosk/__init__.py", line 194, in FinalResult
    return _ffi.string(_c.vosk_recognizer_final_result(self._handle)).decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte
KJ7LNW commented 1 year ago

force-pushed change to print the exception as format(e) instead of str(e), which I think is more correct...

ideasman42 commented 1 year ago

As a workaround this may be OK, coldn't this be handled on VOSK's side: as every user of the VOSK API should really not have to workaround unicode-decoding errors.

Errors could be ignored e.g.

>>> b'A\xaeB'.decode('utf-8', errors='ignore')
'AB'
KJ7LNW commented 1 year ago

You have a good point that this should be addressed in their API, not sure why I did not think of that first. I opened an issue in their repository:

For now, would you like to accept this pull request as a workaround?