Closed analog-cbarber closed 5 years ago
DoReFa-Net code is from http://dorefa.net/
My recommendation would be to save yourself the trouble of implementing the derivative expicitly and remove the tanh squashing operation from QFullyConnected and QConvolution operators and to just do clipping and rounding. If desired, the squashing operation can be done external to the Q operators, and provided as an option in the gluon Q* blocks.
For multi-bit weight quantization you have implemented the tanh-based squashing function as described in the DoReFa-Net paper. However, instead of incorporating its derivative in the weight updates you simply apply the quantization squashing and quantization in place and ignore the derivative of the squashing operation entirely.
For comparison, here is the DoReFa-Net quantization code. Note how it replaces the gradient of the quantization rounding with identity but does not modify the gradient of the squashing operations: