Closed bohanzhuang closed 5 years ago
same problem. I currenty would like to test the inference speedup only. Here is the test code.
`ctx = mx.cpu() phases = ['k3s1', 'k3s2', 'k1s2', 'sign'] def get_sym(phase="k3s1", bitA=32, bitW=32): data = mx.sym.Variable(name='data') output = data if phase == 'k3s1': for i in range(20): output = mx.sym.QConvolution_v1(output, num_filter=64, kernel=(3,3), stride=(1,1), pad=(1,1), nobias=True, name="k3s1%d" % i, act_bit=bitA, weight_bit=bitW, binarized_weights_only=True, cudnn_off=True) elif phase == 'k3s2': output = mx.sym.QConvolution(data, num_filter=64, kernel=(3,3), stride=(2,2), pad=(1,1), no_bias=True, name="k3s2", act_bit=bitA, weight_bit=bitW) elif phase == 'k1s2': output = mx.sym.QConvolution(data, num_filter=64, kernel=(1,1), stride=(2,2), pad=(0,0), no_bias=True, name="k1s2", act_bit=bitA, weight_bit=bitW) else: output = mx.sym.QActivation(data, name="sign") return output
image_shape = (10, 64, 224, 224) iterations = 100 from collections import namedtuple Batch = namedtuple('Batch', ['data']) def test(model): start = time.time() for i in range(iterations): data = [mx.nd.ones(image_shape)] model.forward(Batch(data)) end = time.time() print(end - start)
sym = get_sym() mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None) mod.bind(for_training=False, data_shapes=[('data', image_shape)], label_shapes=mod._label_shapes) mod.init_params() test(mod)
sym = get_sym(phase="k3s1", bitA=1, bitW=1) mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None) mod.bind(for_training=False, data_shapes=[('data', image_shape)], label_shapes=mod._label_shapes) mod.init_params() test(mod)`
The binary version is slower than the FP32 version. I don't know whehter some switch is not opened. @ry @pluskid @darxriggs Could you please give me some light?
After add a debug information in the smd_hpi/src/q_convolution.cc, it seemed only when the input size (N C W *H) is less than a threshold, could mxnet enter functions for the xnor operations. Otherwise, functions in q_convolution-inl.h and q_convolution.cc would not be triggered.
please see my answer in this track: https://github.com/hpi-xnor/BMXNet/issues/6
please check our new version BMXNet v2: https://github.com/hpi-xnor/BMXNet-v2
BMXNet v2 support Gluon for training, after training you can convert your model to binary using model_converter, then use xnor forward for inference.
Recently I tried to run the "binary_mnist" example on CPU following the instructions. However, I failed to observe any speedup by comparing binary lenet (680 fps) and floating-point lenet (747 fps). I think the reason is that QConvolution and QActivation still perform floating-point operations even though weights and activations are 1-bit. How can I test with run-time XNOR operations? Moreover, I converted the output xxxx.params file into binary but found lower testing speed (345 fps). And I'm not sure what's the exact problem. My environment is python3.6.8 and my CPU configuration is Intel(R) Xeon(R) CPU E5-2630 v3 with 8 cores.