cudnn_off=False has some problems with mx.mon.Monitor

hpi-xnor / BMXNet

(New version is out: https://github.com/hpi-xnor/BMXNet-v2) BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

Apache License 2.0

349 stars 95 forks source link

cudnn_off=False has some problems with mx.mon.Monitor #43

Closed jacky4323 closed 5 years ago

jacky4323 commented 6 years ago

Hi, I use cudnn_off=False ,but the output is nan when I use mx.mon.Monitor when training(left hand side of the figure below),If I use cudnn_off=True,the output value is more reasonable(right hand side of the figure) Could you please help me?thanks a lot!!

  conv2_act1 = mx.sym.QActivation(data=batch1_3, act_bit=1, backward_only=True, name="conv2_act1")
  conv2_1 = mx.sym.QConvolution(
        data=conv2_act1,pad=(1, 1), kernel=(3,3), num_filter=128,  act_bit=1, weight_bit=1,cudnn_off=True, name="conv2_1")

yanghaojin commented 6 years ago

could you please remove the padding parameter or set pad=0, and try them again? Let me know the result, thanks!

jacky4323 commented 6 years ago

Hi, this is a result set pad=0 has correct output, pad=1 output will become nan

conv2_act1 = mx.sym.QActivation(data=batch1_3, act_bit=1, backward_only=True, name="conv2_act1") conv2_1 = mx.sym.QConvolution( data=conv2_act1,pad=(0, 0), kernel=(3,3), num_filter=128, act_bit=1, weight_bit=1,cudnn_off=False, name="conv2_1") relu2_1 = mx.symbol.Activation(data=conv2_1, act_type="relu", name="relu2_1") batch2_1 = mx.sym.BatchNorm(data=relu2_1, name="batch2_1") conv2_act2 = mx.sym.QActivation(data=batch2_1, act_bit=1, backward_only=True, name="conv2_act2") conv2_2 = mx.sym.QConvolution( data=conv2_act2,pad=(1, 1), kernel=(3,3), num_filter=128, act_bit=1, weight_bit=1, cudnn_off=False,name="conv2_2")

yanghaojin commented 6 years ago

Thanks for your report! I think this is a bug in the binary cuDNN conv layer. Wir will try to fix it as soon as possible! Before that you could use the normal binary conv layer (slower) or try to prevent padding.

yanghaojin commented 5 years ago

please check our new version BMXNet v2: https://github.com/hpi-xnor/BMXNet-v2