google / qkeras

QKeras: a quantization deep learning library for Tensorflow Keras
Apache License 2.0
533 stars 102 forks source link

Slow training #93

Open schmiph2 opened 2 years ago

schmiph2 commented 2 years ago

Hi everyone

Is it expected behavior that the quantization-aware training in QKeras is much slower than normal training in Keras? And if so, out of interest, where does the overhead come from? From the quantization-dequantization operation?

Thank you for your help!

danielemoro commented 2 years ago

The quantization operations will add a little more computation for every inference, but it should not be significant. Most of the slowness in training is expected to come from the increased number of epochs you may need to train for quantized models, especially if going to low precisions and the training becomes unstable.

What sorts of slow-downs are you experiencing? Do you have any examples / data?

schmiph2 commented 2 years ago

Hi Daniele

Thank you for your response. I have prepared a Colab notebook for a similar setup as I intended to work on (mapping time-sequence X to Y). With the Keras implementation the training takes 0.22 s and with Qkeras 6 s per step. If I remove the GRU, there is still a difference, but it is not as big as before (50 ms vs 90 ms per step). I assume that the main difference for the slower training of the DNN with GRU comes from a non-cuDNN optimized implementation of GRU (e.g., because of quantized activiations). On the Tensorflow description of the GRU layer, they also mention, that the fast cuDNN GRU is only used for the standard configuration.

It seems that slow training is not a problem of QKeras but of non-standard GRUs in general.