Quantization of the model

mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Mozilla Public License 2.0

25.3k stars 3.96k forks source link

Quantization of the model #133

Closed lissyx closed 5 years ago

lissyx commented 7 years ago

Should be split up

lissyx commented 7 years ago

Currently, we have a process in place to :

Export graph session as GraphDef file
Quantize using checkpoint and GraphDef files
Reload the quantized model
Trigger a Test WER on the quantized model

So far, with 10k samples and 100 epochs on TED, we get:

Test WER ~0.39 on non quantized model
Test WER ~0.82 on quantized reloaded model

$ grep -E "^Test WER|^Test Quantized WER" ted_10000s_100e_quant.log 
Test WER: 0.396643
Test Quantized WER: 0.820484

lissyx commented 7 years ago

I did a second run on top of current master (ted_10000s_100e_quant_new.log):

$ grep -E "^Test.*WER|^Test Quantized.*WER" ted_10000s_100e_quant.log ted_10000s_100e_quant_new.log
ted_10000s_100e_quant.log:Test WER: 0.396643
ted_10000s_100e_quant.log:Test Quantized WER: 0.820484
ted_10000s_100e_quant_new.log:Test loss=245.908058711 avg_cer=0.562084945 WER: 0.982135
ted_10000s_100e_quant_new.log:Test Quantized loss=1720.364662638 avg_cer=2.913973395 WER: 1.806516

Both made with the same quantized graph.

lissyx commented 7 years ago

After much problems, process is now sound and complete and we can apply "weights" quantization, reload the graph, perform an inference step and get consistent results.

There are still issues:

unable to rely on "eightbit" quantization method (https://github.com/tensorflow/tensorflow/issues/7162)
using BasicLSTMCell results in errors at run time, switching to BasicRNNCell works

kdavis-mozilla commented 7 years ago

Does GRUCell work?

lissyx commented 7 years ago

I just tested, GRUCell fails in a similar way:

tensorflow.python.framework.errors_impl.InvalidArgumentError: The node 'bidirectional_rnn/bw/bw/while/Select' has inputs from different frames. The input 'bidirectional_rnn/bw/bw/while/gru_cell/add' is in frame ''. The input 'bidirectional_rnn/bw/bw/while/Select/Enter' is in frame 'bidirectional_rnn/bw/bw/while/bidirectional_rnn/bw/bw/while/'.

reuben commented 7 years ago

Good news from upstream: weights quantization works on LSTMCell by using the new Graph Transform Tool: https://github.com/tensorflow/tensorflow/issues/7949#issuecomment-283398812

Bad news: quantizing the operations themselves is still broken.

lissyx commented 7 years ago

Using the graph transformation tool as suggested by upstream, we get quantize_weights to work on the model (but using quantize_nodes still exposes the frame problem). Using the frozen/quantized graph works with the native client, and I am currently experimenting with valgrind massif to get data on memory usage at runtime.

lissyx commented 5 years ago

Closing because #1850

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.