torch / nn

Other
1.34k stars 967 forks source link

MSECriterion_updateGradInput error: cannot convert 'struct THCudaHalfTensor *' to 'struct THCudaTensor * #1290

Open michaelhuang74 opened 7 years ago

michaelhuang74 commented 7 years ago

I tried to convert the 32-bit neural style by Justin Johnson (https://github.com/jcjohnson/neural-style) to 16-bit. My code is at https://github.com/michaelhuang74/FP16-Neural-Style

Due to 'inf' and 'nan' problem, I have moved gram matrix operations to 32-bit. Gram matrix seems working. However, the MSECriterion:updateGradInput method in 16-bit mode will generate 'inf' and 'nan' in the second iteration. Therefore, I tried to move the MSECriterion to 32-bit. The code generated the following error: lua/5.1/nn/THNN.lua:110: bad argument #4 to 'v' (cannot convert 'struct THCudaHalfTensor ' to 'struct THCudaTensor ')

Then I tried to create my own version of the MSECriterion, i.e. SELF_MSECriterion. It generated the same error. My command is as follows.

th neural_style_half.lua -style_image style/vangogh.jpg -content_image inputimage/man_face.jpg -style_weight 20 -content_weight 1 -image_size 100 -style_scale 1 -style_layers relu4_2 -output_image outputimage/man_face.vangogh.sw20.cw1.sl42.c100s100.3it.adam.cudnn.half.vgg.jpg -num_iterations 3 -save_iter 1 -print_iter 1 -backend cudnn -cudnn_autotune -optimizer adam

The complete error message is as follows.

/home/mqhuang/torch/install/bin/luajit: /home/mqhuang/torch/install/share/lua/5.1/nn/Container.lua:67: In 25 module of nn.Sequential: /home/mqhuang/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #4 to 'v' (cannot convert 'struct THCudaHalfTensor ' to 'struct THCudaTensor ') stack traceback: [C]: in function 'v' /home/mqhuang/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'MSECriterion_updateGradInput' neural_style_half.lua:682: in function 'backward' neural_style_half.lua:599: in function [C]: in function 'xpcall' /home/mqhuang/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/mqhuang/torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput' neural_style_half.lua:294: in function 'opfunc' /home/mqhuang/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' neural_style_half.lua:326: in function 'main' neural_style_half.lua:694: in main chunk [C]: in function 'dofile' ...uang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670

I just re-installed torch, cuda (9.0), and cudnn (7.0) very recently. The OS is Ubuntu 14.04.5.