hpi-xnor / BMXNet

(New version is out: https://github.com/hpi-xnor/BMXNet-v2) BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet
Apache License 2.0
350 stars 95 forks source link

QConvolution mutates weights running forward #38

Closed analog-cbarber closed 5 years ago

analog-cbarber commented 6 years ago

As with #30, QConvolution also mutates its input weights when running forward, but only when in training mode. The weights should only be mutated by the actual weight update performed by the optimizer. Instead weight quantization should be performed in temporary storage.

Unlike QFullyConnected, QConvolution does not even perform weight quantization when not in training mode! This means that given the same input weights, it will produce different outputs depending on whether training mode is enabled. Perhaps this was intended as an optimization, but it really is a bad idea to change the semantics of an operator based on the training state unless you have a very good reason (as in BatchNorm). Since the number of weights will typically be a small fraction of the output volume of a mini-batch, it probably isn't worth dropping the quantization step when not training.

Really I think you should try to eliminate all calls to ctx.is_train in the implementation.

yanghaojin commented 6 years ago

Hi, for the forward pass we always do the quantization (binarization), but in a different way when

analog-cbarber commented 6 years ago

Obviously the CPU and GPU implementations will be different, but I don't understand why the operator should behave differently when in training mode. This is not how most MXNet operators work.

yanghaojin commented 5 years ago

please check our new version BMXNet v2: https://github.com/hpi-xnor/BMXNet-v2