mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.37k stars 3.97k forks source link

Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model #2164

Closed testdeepv closed 5 years ago

testdeepv commented 5 years ago

I trained a french model on a small french dataset and when I tried to do inferences using the exported model like this : python3.6 deepspeech --model ~/results/model_export/output_graph.pb --alphabet ~/Deepspeech/data/alphabet.txt --lm ~/DeepSpeech/data/lm/lm.binary --trie ~/DeepSpeech/data/lm/trie --audio test.wav -t I got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Any suggestions to resolve this please ?

lissyx commented 5 years ago

SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Any suggestions to resolve this please ?

This is produced by Python itself, and clearly not something I reproduce on my french system. Can you make sure your pip install is uptodate ?

testdeepv commented 5 years ago

When I run this command : python3.6 -m pip --version I get : pip 18.1 from /usr/local/lib/python3.6/site-packages/pip (python 3.6) I have to upgrade it ?

lissyx commented 5 years ago

Strange that your pip is in /usr/local, can you make sure your setup is straight ?

testdeepv commented 5 years ago

What do you mean by "setup is straight" ?

lissyx commented 5 years ago

What do you mean by "setup is straight" ?

Well, /usr/local feels like a non-default distro setup. The deepspeech file is being generated at install time.

testdeepv commented 5 years ago

I'm running the deepspeech file from the Deepspeech native client

lissyx commented 5 years ago

I'm running the deepspeech file from the Deepspeech native client

Well, you mention python3.6 above. So I suspect you did pip install ?

lissyx commented 5 years ago
alex@portable-alex:~/tmp/deepspeech/issue2164$ source venv/bin/activate
(venv) alex@portable-alex:~/tmp/deepspeech/issue2164$ pip install deepspeech==0.5.0a11
Collecting deepspeech==0.5.0a11
  Downloading https://files.pythonhosted.org/packages/b2/fd/bdcb51eae62e6df60a252e8395d49ef145fa101139b530b8e81448ca336e/deepspeech-0.5.0a11-cp37-cp37m-manylinux1_x86_64.whl (15.6MB)
     |████████████████████████████████| 15.6MB 4.9MB/s 
Collecting numpy>=1.14.5 (from deepspeech==0.5.0a11)
  Downloading https://files.pythonhosted.org/packages/fc/d1/45be1144b03b6b1e24f9a924f23f66b4ad030d834ad31fb9e5581bd328af/numpy-1.16.4-cp37-cp37m-manylinux1_x86_64.whl (17.3MB)
     |████████████████████████████████| 17.3MB 65.5MB/s 
Installing collected packages: numpy, deepspeech
Successfully installed deepspeech-0.5.0a11 numpy-1.16.4
(venv) alex@portable-alex:~/tmp/deepspeech/issue2164$ deepspeech 
usage: deepspeech [-h] --model MODEL --alphabet ALPHABET [--lm [LM]]
                  [--trie [TRIE]] --audio AUDIO [--version] [--extended]
deepspeech: error: the following arguments are required: --model, --alphabet, --audio
(venv) alex@portable-alex:~/tmp/deepspeech/issue2164$ which deepspeech
/home/alex/tmp/deepspeech/issue2164/venv/bin/deepspeech
lissyx commented 5 years ago

@testdeepv The file /home/alex/tmp/deepspeech/issue2164/venv/bin/deepspeech is being generated at pip install time. According to your error, it's the one with bogus UTF-8. But we don't control it.

testdeepv commented 5 years ago

I installed python3.6 because in the VM I'm using python3.5 is the default python3. I git cloned deepspeech and mozilla tensorflow, build both of them, generated binaries and trained a french model. I didn't pip install deepspeech, I have it in the deepspeech native client after the build

lissyx commented 5 years ago

I didn't pip install deepspeech, I have it in the deepspeech native client after the build

Well then please document exactly what you did.

I git cloned deepspeech and mozilla tensorflow, build both of them, generated binaries and trained a french model.

Why did you do this ? We have prebuilt binaries, you don't have to do that.

I installed python3.6 because in the VM I'm using python3.5 is the default python3.

Python 3.5 should work as well.

lissyx commented 5 years ago

trained a french model.

Also, could you please join efforts ? https://github.com/Common-Voice/commonvoice-fr/pull/44 https://github.com/Common-Voice/commonvoice-fr https://discourse.mozilla.org/c/voice/fr

testdeepv commented 5 years ago

I git cloned deepspeech and mozilla tensorflow, build both of them, generated binaries and trained a french model.

Why did you do this ? We have prebuilt binaries, you don't have to do that. I changed alphabet.txt to add french caracters and then created lm.binary and trie files

lissyx commented 5 years ago

I git cloned deepspeech and mozilla tensorflow, build both of them, generated binaries and trained a french model.

Why did you do this ? We have prebuilt binaries, you don't have to do that. I changed alphabet.txt to add french caracters and then created lm.binary and trie files

Still, you don't need to rebuild just to change alphabet.

testdeepv commented 5 years ago

then I have to do pip install deepspeech and do not use the deepspeech I have in native client file ?

lissyx commented 5 years ago

then I have to do pip install deepspeech and do not use the deepspeech I have in native client file ?

There is no good reason in your case to have to rebuild everything, yes. Also, sorry to insist, but it's really important that you join efforts to help produce a french model ...

testdeepv commented 5 years ago

trained a french model.

Also, could you please join efforts ? Common-Voice/commonvoice-fr#44 https://github.com/Common-Voice/commonvoice-fr https://discourse.mozilla.org/c/voice/fr

I will sure do it :)

testdeepv commented 5 years ago

then I have to do pip install deepspeech and do not use the deepspeech I have in native client file ?

There is no good reason in your case to have to rebuild everything, yes. Also, sorry to insist, but it's really important that you join efforts to help produce a french model ...

I did this : sudo python3.6 -m pip install deepspeech==0.5.0a11 and when doing this command : python3.6 deepspeech --model ~/results/model_export/output_graph.pb --alphabet ~/Deepspeech/data/alphabet.txt --lm ~/DeepSpeech/data/lm/lm.binary --trie ~/DeepSpeech/data/lm/trie --audio test.wav -t I still get an error: python3.6: can't open file 'deepspeech': [Errno 2] No such file or directory

lissyx commented 5 years ago

sudo python3.6 -m pip install deepspeech==0.5.0a11

You should really follow the docs and use virtualenv, installing as root is not a good practice.

python3.6: can't open file 'deepspeech': [Errno 2] No such file or directory

That's another issue now. What does which deepspeech and ls -hal $(which deepspeech) gives?

testdeepv commented 5 years ago

sudo python3.6 -m pip install deepspeech==0.5.0a11

You should really follow the docs and use virtualenv, installing as root is not a good practice. OK

python3.6: can't open file 'deepspeech': [Errno 2] No such file or directory

That's another issue now. What does which deepspeech and ls -hal $(which deepspeech) gives?

which deepspeech output : /usr/local/bin/deepspeech 'ls -hal $(which deepspeech)' gives : -rwxr-xr-x 1 root root 228 Jun 11 10:09 /usr/local/bin/deepspeech

lissyx commented 5 years ago

'ls -hal $(which deepspeech)' gives : -rwxr-xr-x 1 root root 228 Jun 11 10:09 /usr/local/bin/deepspeech

Can you paste its content ?

testdeepv commented 5 years ago
#!/usr/local/bin/python3.6

# -*- coding: utf-8 -*-
import re
import sys

from deepspeech.client import main

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(main())
lissyx commented 5 years ago

@testdeepv Strange. Can you please properly uninstall and then reinstall, using your distro's Python/PIP and a virtualenv as we document ?

testdeepv commented 5 years ago

may be I have to install the gpu version of deepspeech ?

lissyx commented 5 years ago

may be I have to install the gpu version of deepspeech ?

No, it's unrelated.

testdeepv commented 5 years ago

when i did this and specify the deepspeech path :

python3.6 /usr/local/bin/deepspeech --model ~/results/model_export/output_graph.pb --alphabet ~/Deepspeech/data/alphabet.txt --lm ~/DeepSpeech/data/lm/lm.binary --trie ~/DeepSpeech/data/lm/trie --audio test.wav -t
I get this : 
Loading model from file ~/results/model_export/output_graph.pb
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.0-alpha.11-0-g1201739
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-06-11 10:31:48.501980: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions tha
t this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-11 10:31:48.511050: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR
_NO_DEVICE: no CUDA-capable device is detected
2019-06-11 10:31:48.511184: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:148] kernel driver does not appea
r to be running on this host (instance-2): /proc/driver/nvidia/version does not exist
2019-06-11 10:31:48.570438: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" de
vice_type: "CPU"') for unknown op: UnwrapDatasetVariant
2019-06-11 10:31:48.570541: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" devi
ce_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVarian
t
2019-06-11 10:31:48.570554: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" devi
ce_type: "CPU"') for unknown op: WrapDatasetVariant
2019-06-11 10:31:48.570798: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" de
vice_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVa
riant
Loaded model in 0.0733s.
Loading language model from files~/DeepSpeech/data/lm/lm.binary ~/DeepSpeech/data/lm/trie
Loaded language model in 0.0135s.
Running inference.
Inference took 2.695s for 7.160s audio file.

I can't understand all this tensorflow warnings :(

lissyx commented 5 years ago

@testdeepv The warnings are harmless. It seems to work.

testdeepv commented 5 years ago

But I didn't get inferences :(

testdeepv commented 5 years ago

I have to convert my output_graph.pb like this or not ? $ convert_graphdef_memmapped_format --in_graph=output_graph.pb --out_graph=output_graph.pbmm

lissyx commented 5 years ago

I have to convert my output_graph.pb like this or not ? $ convert_graphdef_memmapped_format --in_graph=output_graph.pb --out_graph=output_graph.pbmm

That just means your training was not enough. Hence why I insist on contributing to french model, because you are not the first one training and getting empty inferences, because of not enough data / training / improper parameters.

lissyx commented 5 years ago

@testdeepv If you need a model that works right now, there's no best solution than to train on top of english (not yet) released 0.5.0 and with other dataset, as documented in WIP PR https://github.com/Common-Voice/commonvoice-fr/pull/44 as well as https://discourse.mozilla.org/t/un-premier-modele-francais/41100/7

testdeepv commented 5 years ago

I have to convert my output_graph.pb like this or not ? $ convert_graphdef_memmapped_format --in_graph=output_graph.pb --out_graph=output_graph.pbmm

That just means your training was not enough. Hence why I insist on contributing to french model, because you are not the first one training and getting empty inferences, because of not enough data / training / improper parameters.

For the wav files in the test file I get inferences image

but when I want to test the exported model I don't get anything

lissyx commented 5 years ago

I have to convert my output_graph.pb like this or not ? $ convert_graphdef_memmapped_format --in_graph=output_graph.pb --out_graph=output_graph.pbmm

That just means your training was not enough. Hence why I insist on contributing to french model, because you are not the first one training and getting empty inferences, because of not enough data / training / improper parameters.

For the wav files in the test file I get inferences image

but when I want to test the exported model I don't get anything

Please avoid posting screenshots, it's very hard to use. Again, that's expected bahavior. Without the full training log it's hard to be definitive, but it's really not surprising ...

lissyx commented 5 years ago

@testdeepv Should we close ?

testdeepv commented 5 years ago

just a question concerning the amount of data to get good inferences (50 hours isn't enough ?) getting more data is the best solution to avoid getting empty inferences ?

lissyx commented 5 years ago

Yes, 50 hours is way way way not enough.

lissyx commented 5 years ago

getting more data is the best solution to avoid getting empty inferences ?

That and ensuring proper training. Since you have not shared your parameters, I can't tell if it also has a play in your case.

testdeepv commented 5 years ago

the command to train deepspeech.py :

python3.6 -u DeepSpeech.py --train_files ~/deepspeech_dataset/clips/train.csv --dev_files ~/deepspeech_dataset/clips/dev.csv --test_files ~/deepspeech_dataset/clips/test.csv --train_batch_size 80 --dev_batch_size 80 --test_batch_size 40 --n_hidden 1024 --epoch 50 --use_seq_length False --report_count 100 --remove_export True --checkpoint_dir ~/results/checkpoints/ --export_dir ~/results/model_export/ --alphabet_config_path ~/DeepSpeech/data/alphabet.txt --lm_binary_path ~/DeepSpeech/data/lm/lm.binary --lm_trie_path ~/DeepSpeech/data/lm/trie

and the training stops after 10 epochs with this message : I Early stop triggered as (for last 4 steps) validation loss: 68.492657 with standard deviation: 0.667485 and mean: 67.559747 I FINISHED optimization in after that I get the inferences for test.csv (and they were not empty)

reuben commented 5 years ago

Are you saying when you run the client on the same audio files that are in your test CSV file, it gives different results than the training code?

testdeepv commented 5 years ago

Are you saying when you run the client on the same audio files that are in your test CSV file, it gives different results than the training code?

my wave file was tested during the learning process and it gives a result. but when I tried to test the same file with my exported model it gives me empty inferences...

lissyx commented 5 years ago

my wave file was tested during the learning process and it gives a result. but when I tried to test the same file with my exported model it gives me empty inferences...

That should not happen. 50 hours and 10 epochs is obviously not enough, but if you test a file from the test set, you should get the same result.

testdeepv commented 5 years ago

my wave file was tested during the learning process and it gives a result. but when I tried to test the same file with my exported model it gives me empty inferences...

That should not happen. 50 hours and 10 epochs is obviously not enough, but if you test a file from the test set, you should get the same result.

I made epochs = 50 but the training process stopped after 10 epochs and I'm not getting the same result for the same file tested

lissyx commented 5 years ago

Could you share the full training log ?

testdeepv commented 5 years ago
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.

WARNING:tensorflow:From /usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py:696: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I Restored variables from most recent checkpoint at ~/results/checkpoints/train-158, step 158
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 1:02:16 | Steps: 159 | Loss: 132.824389   WARNING:tensorflow:From /usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:966: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:

then I have steps for each epoch and at the end of each epoch I had this :

I Saved new best validating model with loss x to: ~/results/checkpoints/best_dev-325

after 10 epochs I get this message :

I Early stop triggered as (for last 4 steps) validation loss: 68.492657 with standard deviation: 0.667485 and mean: 67.559747
I FINISHED optimization in 14:35:47.888453
I Restored variables from best validation checkpoint at ~/results/checkpoints/best_dev-1494, step 1494
Testing model on ~/deepspeech_dataset/clips/test.csv
Test epoch | Steps: 158 | Elapsed Time: 0:24:45                                
Test on ~/deepspeech_dataset/clips/test.csv - WER: 0.709969, CER: 0.413470, loss: 68.295815
WER: 1.500000, CER: 0.333333, loss: 28.491713
 - src: "en substitution"
 - res: "on se situation"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 0.647059, loss: 29.111736
 - src: "vingtdeux maisons"
 - res: "va de mal"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 0.833333, loss: 29.264719
 - src: "depuis quand"
 - res: "deux plus fort"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 0.583333, loss: 30.396891
 - src: "où habitestu"
 - res: "ou vite que"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 0.375000, loss: 30.682699
 - src: "avis défavorable"
 - res: "avenue de favorable"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 0.470588, loss: 30.730188
 - src: "rendeznous jospin"
 - res: "route nous juste"
--------------------------------------------------------------------------------
WER: 1.500000, CER: 0.588235, loss: 31.368996
 - src: "habillezvous vite"
 - res: "aviez ou les"
--------------------------------------------------------------------------------
I Exporting the model...
I Removing old export
WARNING:tensorflow:From /usr/local/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /usr/local/lib/python3.6/site-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
I Models exported at ~/results/model_export/
lissyx commented 5 years ago

@testdeepv Can you ensure (sha1 fingerprint) that the alphabets are the same ?

testdeepv commented 5 years ago

didn't get what you mean by "(sha1 fingerprint)" :/

lissyx commented 5 years ago

sha1sum alphabet.txt

testdeepv commented 5 years ago

sha1sum alphabet.txt

this command gives : 17d3fd2c19e31be7fdda16f1355053f1b8ca4612 alphabet.txt

lissyx commented 5 years ago

sha1sum alphabet.txt

this command gives : 17d3fd2c19e31be7fdda16f1355053f1b8ca4612 alphabet.txt

Can your quadruple check you are absolutely using the same and the correct alphabet, lm.binary and trie files? 99.99% of the "empty inferences", outside of improper training, were related to that.

testdeepv commented 5 years ago

I made in alphabet.txt the french caracters I generated lm.binary like this :

kenlm/build/bin/./lmplz --text ~/DeepSpeech/data/vocabulary.txt --arpa ~/DeepSpeech/data/words.arpa --o 5
kenlm/build/bin/./build_binary -T -s ~/DeepSpeech/data/words.arpa ~/DeepSpeech/data/lm/lm.binary

and generated trie like this : ~/tensorflow/bazel-bin/native_client/generate_trie ~/DeepSpeech/data/alphabet.txt ~/DeepSpeech/data/lm.binary ~/DeepSpeech/data/trie how can I check this ?