could you quickly elaborate why the QActivation (as referenced in the code snippet of the paper) is in front of the QActivation/QFullyConnected layers ?
for example why is there another activation layer after the binarized one:
ba2 = mx.symbol.QActivation(..) fc1 = mx.symbol.QFullyConnected(..) bn3 = mx . sym . BatchNorm (...) tanh3 = mx . sym . Activation (...)
could one use the mx.symbol.LeakyReLU or would you suggest to implement activation functions like Prelu/Swish (as supported by Gluon API) for binary networks in the underlying C/C++ src code?
you can also remove the tanh and relu activation, just apply binary activation. We found that adding a relu activation after each residual block in resnet architecture, we could slightly improve the accuracy.
if you want to deploy binary model on some low power devices which only support c/c++, you probably have to do this.
c_predict_api.h should work If you could adapt the corresponding makefile/cmake file to include bmxnet specific sources., for feature_extract.cpp, I didn't check it, but if the standard mxnet convolution layers work with it, there is no reason why the QConvolution layer can not.
@yanghaojin
ba2 = mx.symbol.QActivation(..) fc1 = mx.symbol.QFullyConnected(..) bn3 = mx . sym . BatchNorm (...) tanh3 = mx . sym . Activation (...)