mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.29k stars 3.96k forks source link

Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model #2164

Closed testdeepv closed 5 years ago

testdeepv commented 5 years ago

I trained a french model on a small french dataset and when I tried to do inferences using the exported model like this : python3.6 deepspeech --model ~/results/model_export/output_graph.pb --alphabet ~/Deepspeech/data/alphabet.txt --lm ~/DeepSpeech/data/lm/lm.binary --trie ~/DeepSpeech/data/lm/trie --audio test.wav -t I got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Any suggestions to resolve this please ?

lissyx commented 5 years ago

how can I check this ?

Check what ?

I made in alphabet.txt the french caracters

How ?

testdeepv commented 5 years ago

how can I check this ?

Check what ?

I made in alphabet.txt the french caracters

How ?

nano alphabet.txt and then add alphabet

lissyx commented 5 years ago

nano alphabet.txt and then add alphabet

And how did you made sure you covered everything that was in the dataset and in the language model source file ?

testdeepv commented 5 years ago

nano alphabet.txt and then add alphabet

And how did you made sure you covered everything that was in the dataset and in the language model source file ?

python3.6 check_characters.py -csv ~/deepspeech_dataset/clips/train.csv

lissyx commented 5 years ago

nano alphabet.txt and then add alphabet

And how did you made sure you covered everything that was in the dataset and in the language model source file ?

python3.6 check_characters.py -csv ~/deepspeech_dataset/clips/train.csv

You could have generated the alphabet using that tool.

testdeepv commented 5 years ago

my alphabet.txt contains all the caracters gived by this command so I don't think that the empty inferences are caused by that

lissyx commented 5 years ago

my alphabet.txt contains all the caracters gived by this command so I don't think that the empty inferences are caused by that

No, but if you use two differently ordered alphabet for example, it might mess. We've got reports in the past of people messing around with alphabet files and getting empty results.

testdeepv commented 5 years ago

may be I have to use the deepspeech in the native client and not the one that I get by doing pip install. The problem is that when using the native client "deepspeech" i got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Is there a difference between using the native client "deepspeech" and the installed one ?

lissyx commented 5 years ago

The problem is that when using the native client "deepspeech" i got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Is there a difference between using the native client "deepspeech" and the installed one ?

Well since i have not been able to understand what you did to get that error, and what you refer to as "native client deepspeech", I can't tell.

lissyx commented 5 years ago

@testdeepv Can you update us ? Is there still a legit issue here or was it just an improper setup ?

reuben commented 5 years ago

deepspeech likely refers to the binary installed when you install the Python package (client.py), rather than DeepSpeech.py (which doesn't accept those parameters and would fail earlier).

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.