QConvolution mutates weights running forward

analog-cbarber commented 6 years ago

As with #30, QConvolution also mutates its input weights when running forward, but only when in training mode. The weights should only be mutated by the actual weight update performed by the optimizer. Instead weight quantization should be performed in temporary storage.

Unlike QFullyConnected, QConvolution does not even perform weight quantization when not in training mode! This means that given the same input weights, it will produce different outputs depending on whether training mode is enabled. Perhaps this was intended as an optimization, but it really is a bad idea to change the semantics of an operator based on the training state unless you have a very good reason (as in BatchNorm). Since the number of weights will typically be a small fraction of the output volume of a mini-batch, it probably isn't worth dropping the quantization step when not training.

Really I think you should try to eliminate all calls to ctx.is_train in the implementation.

yanghaojin commented 6 years ago

Hi, for the forward pass we always do the quantization (binarization), but in a different way when

using cpu,
both intput and weights are 1-bit and
not in training mode. then apply bit operation (popcn(xnor)) for the gemm computation, all the weights and activations are 0 and 1. In the training stage, we apply standard binarization function, by which the w and act are 1 and -1. we apply Cudnn for the training, which significantly speeds up. So the basic idea is to take advantages of both bit ops and cudnn.

analog-cbarber commented 6 years ago

Obviously the CPU and GPU implementations will be different, but I don't understand why the operator should behave differently when in training mode. This is not how most MXNet operators work.

yanghaojin commented 5 years ago

please check our new version BMXNet v2: https://github.com/hpi-xnor/BMXNet-v2

hpi-xnor / BMXNet

QConvolution mutates weights running forward #38