Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model

testdeepv commented 5 years ago

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (our builds, or upstream TensorFlow): mozilla tensorflow
TensorFlow version (use command below): tensorflow-gpu 1.13
Python version: 3.6
Bazel version (if compiling from source): 0.19.2
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 10.0
GPU model and memory: NVIDIA K80
Exact command to reproduce:

I trained a french model on a small french dataset and when I tried to do inferences using the exported model like this : python3.6 deepspeech --model ~/results/model_export/output_graph.pb --alphabet ~/Deepspeech/data/alphabet.txt --lm ~/DeepSpeech/data/lm/lm.binary --trie ~/DeepSpeech/data/lm/trie --audio test.wav -t I got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Any suggestions to resolve this please ?

lissyx commented 5 years ago

how can I check this ?

Check what ?

I made in alphabet.txt the french caracters

How ?

testdeepv commented 5 years ago

how can I check this ?

Check what ?

I made in alphabet.txt the french caracters

How ?

nano alphabet.txt and then add alphabet

lissyx commented 5 years ago

nano alphabet.txt and then add alphabet

And how did you made sure you covered everything that was in the dataset and in the language model source file ?

testdeepv commented 5 years ago

nano alphabet.txt and then add alphabet

And how did you made sure you covered everything that was in the dataset and in the language model source file ?

python3.6 check_characters.py -csv ~/deepspeech_dataset/clips/train.csv

lissyx commented 5 years ago

nano alphabet.txt and then add alphabet

And how did you made sure you covered everything that was in the dataset and in the language model source file ?

python3.6 check_characters.py -csv ~/deepspeech_dataset/clips/train.csv

You could have generated the alphabet using that tool.

testdeepv commented 5 years ago

my alphabet.txt contains all the caracters gived by this command so I don't think that the empty inferences are caused by that

lissyx commented 5 years ago

my alphabet.txt contains all the caracters gived by this command so I don't think that the empty inferences are caused by that

No, but if you use two differently ordered alphabet for example, it might mess. We've got reports in the past of people messing around with alphabet files and getting empty results.

testdeepv commented 5 years ago

may be I have to use the deepspeech in the native client and not the one that I get by doing pip install. The problem is that when using the native client "deepspeech" i got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Is there a difference between using the native client "deepspeech" and the installed one ?

lissyx commented 5 years ago

The problem is that when using the native client "deepspeech" i got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Is there a difference between using the native client "deepspeech" and the installed one ?

Well since i have not been able to understand what you did to get that error, and what you refer to as "native client deepspeech", I can't tell.

lissyx commented 5 years ago

@testdeepv Can you update us ? Is there still a legit issue here or was it just an improper setup ?

reuben commented 5 years ago

deepspeech likely refers to the binary installed when you install the Python package (client.py), rather than DeepSpeech.py (which doesn't accept those parameters and would fail earlier).

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

mozilla / DeepSpeech

Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model #2164