hughperkins / clnn

OpenCL backend for Torch nn neural networks library
BSD 2-Clause "Simplified" License
126 stars 16 forks source link

"Abs.lua:8: attempt to index field 'THNN' (a nil value)" #21

Closed hughperkins closed 8 years ago

hughperkins commented 8 years ago

This is because THNN isnt implemented in clnn yet, ie https://github.com/torch/nn/pull/547

Started on adding it, in progress, branch adding_THNN https://github.com/hughperkins/clnn/commits/adding_THNN

hughperkins commented 8 years ago

Should be fixed in master now, mostly in https://github.com/hughperkins/clnn/commit/97ff2457bbf681515bf7c674fd7ab19e38454f34 , and a tiny, but critical, change-ette or two in https://github.com/hughperkins/clnn/commit/b0d39777bacf1ac169ae1494c1e8bd87368c7e9f

odellus commented 8 years ago

I'm getting this error from Sigmoid.lua:4 using the latest from branch distro.

lborthwein commented 8 years ago

I'm seeing this in MSECriterion.lua:14:

attempt to index field 'THNN' (a nil value)

hughperkins commented 8 years ago

Unfortunately distro is out of date. I have raised an issue for this at https://github.com/torch/distro/issues/71 In the meantime, please do the following:

luarocks install torch
luarocks install nn
luarocks install cltorch
luarocks install clnn

...and try again

lborthwein commented 8 years ago

Did those steps, still seeing the same error:

/home/ubuntu/torch-distro/install/bin/luajit: ...untu/torch-distro/install/share/lua/5.1/nn/Container.lua:67: In 4 module of nn.Sequential: ...u/torch-distro/install/share/lua/5.1/nn/MSECriterion.lua:14: attempt to index field 'THNN' (a nil value) stack traceback: ...u/torch-distro/install/share/lua/5.1/nn/MSECriterion.lua:14: in function 'forward' neural_style.lua:447: in function [C]: in function 'xpcall' ...untu/torch-distro/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' ...ntu/torch-distro/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' neural_style.lua:204: in function 'main' neural_style.lua:500: in main chunk [C]: in function 'dofile' ...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk [C]: at 0x00406670

hughperkins commented 8 years ago

Can you provide me a short script to reproduce the issue please?

hughperkins commented 8 years ago

I run neural-style like this:

luajit neural_style.lua -backend clnn -model_file models/vgg_normalised.caffemodel -image_size 50

...and seems to run ok for me. My models directory:

md5sum models/*
6d3b0b00017ec30acc10d29101033be8  models/download_models.sh
b568958c0dcf1d97cbcff4c22b02a2be  models/nin_imagenet.caffemodel
8fbacb8dd696607876386e34ff68a84a  models/nin_imagenet_conv.caffemodel
2521163ca34ad45c6910912f9c873567  models/train_val.prototxt.lua
ccbbdda59210208be39f8974f5b5765e  models/VGG_ILSVRC_19_layers_deploy.prototxt
50510b9f43a178d12c017a54b6583f9a  models/VGG_ILSVRC_19_layers_deploy.prototxt.lua
6adcfbc93e8f6762e6421515940526f4  models/vgg_normalised.caffemodel
hughperkins commented 8 years ago

I've radically rolled back torch and cltorch to ~21 Februrary. Plesa ecan you reinstall cltorch, using the instructions at https://github.com/hughperkins/cltorch#installation , and see if the issue is/isnt still there please.

bachaAI commented 8 years ago

@hughperkins after installing cltorch and clnn, I faced with such error :

/Users/apple/torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential: ...le/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:108: torch.ByteTensor.THNN backend not imported stack traceback: [C]: in function 'assert' ...le/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:108: in function <...le/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:107> [C]: in function 'xpcall' /Users/apple/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /Users/apple/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' ...le/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'f' [string "local f = function() return trainer:train(tra..."]:1: in main chunk [C]: in function 'xpcall' /Users/apple/torch/install/share/lua/5.1/itorch/main.lua:209: in function </Users/apple/torch/install/share/lua/5.1/itorch/main.lua:173> /Users/apple/torch/install/share/lua/5.1/lzmq/poller.lua:75: in function 'poll' /Users/apple/torch/install/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll' /Users/apple/torch/install/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex' /Users/apple/torch/install/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start' /Users/apple/torch/install/share/lua/5.1/itorch/main.lua:381: in main chunk [C]: in function 'require' (command line):1: in main chunk [C]: at 0x010d54ad50

Running this :

net = nn.Sequential() net:add(nn.SpatialConvolution(1, 6, 20, 20)) -- 1 input image channel, 6 output channels, 5x5 convolution kernel net:add(nn.ReLU()) -- non-linearity net:add(nn.SpatialMaxPooling(9,9,9,9)) -- A max-pooling operation that looks at 2x2 windows and finds the max. net:add(nn.SpatialConvolution(6, 16, 3, 3)) net:add(nn.ReLU()) -- non-linearity --net:add(nn.SpatialMaxPooling(2,2,2,2)) net:add(nn.View(16_7_3)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16_5_5 net:add(nn.Linear(16_7_3, 120)) -- fully connected layer (matrix multiplication between input and weights) net:add(nn.ReLU()) -- non-linearity net:add(nn.Linear(120, 84)) net:add(nn.ReLU()) -- non-linearity net:add(nn.Linear(84, 5)) -- 10 is the number of outputs of the network (in this case, 10 digits) net:add(nn.LogSoftMax()) -- converts the output to a log-probability. Useful for classification problems

criterion = nn.ClassNLLCriterion()

trainer = nn.StochasticGradient(net, criterion) trainer.learningRate = 0.001 trainer.maxIteration = 5

trainer:train(trainData)

trainData is a byte tensor {data,label}

data is 7628x3x64x100

hughperkins commented 8 years ago

Normally training data will be floats, ie something like:

local inputs = torch.FloatTensor(batchSize, numPlanes, width, height):uniform()
local labels = torch.ByteTensor(batchSize):fill(1) 

The following runs ok for me:

require 'nn'

local batchSize = 32
local numPlanes = 1
local width = 224
local height = 224

local net = nn.Sequential()
net:add(nn.SpatialConvolution(1, 6, 20, 20)) -- 1 input image channel, 6 output channels, 5x5 convolution kernel
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(9,9,9,9)) -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 3, 3))
net:add(nn.ReLU()) -- non-linearity
--net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*20*20)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 1655
net:add(nn.Linear(16*20*20, 120)) -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(120, 84))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(84, 5)) -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax()) -- converts the output to a log-probability. Useful for classification problems
net:float()

local criterion = nn.ClassNLLCriterion()
criterion:float()

local trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.001
trainer.maxIteration = 5

local inputs = torch.FloatTensor(batchSize, numPlanes, width, height):uniform()
local labels = torch.ByteTensor(batchSize):fill(1) 

local dataset = {}
function dataset.size()
  return batchSize
end
local dataset_mt = {}
function dataset_mt.__index(self, i)
  return {inputs[i], labels[i]}
end
setmetatable(dataset, dataset_mt)

trainer:train(dataset)

But note that I havent seen anyone use StochasticGradient module since ... as long as I can remember. Everyone uses optim now. Have a look at:

https://github.com/szagoruyko/cifar.torch/blob/master/train.lua

optim is confusing in its own way, but trying to understand StochasticGradient is both hard and pointless, since no-one uses it :-P

hughperkins commented 8 years ago

Hi @abachinskyi The example in nn has been updated now:

https://github.com/torch/nn/blob/master/doc/training.md#nn.traningneuralnet.dok

It's new, so might have a few buggettes, but it is up to date :-)

xiao1228 commented 8 years ago

After all the steps above I am still getting the following error

torch/install/share/lua/5.1/nn/SpatialMaxPooling.lua:42: attempt to index field 'THNN' (a nil value)

hughperkins commented 8 years ago

Hi @xiao1228

Thanks! The script above runs ok for me, on Ubuntu 16.04. can you provide the following information please?

cd ~/torch-cl
git log -n 5 --oneline
which luajit
uname -a
cat /etc/lsb-release
xiao1228 commented 8 years ago

@hughperkins Thank you for your quick reply.

The output are

9da3e6e remember to add test script for cutorch-cltorch clobbering 96bda1a prevent cutorch clobbering cltorch, if d afterwards fafd384 code highlighting in cltorch readme, and links to cutorch-rtc ed77d3a update readme for hctorch 712410c add hctorch reference

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above. stack traceback: [C]: in function 'error' /home/xiao/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /home/xiao/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' 4b-training-adadelta-cuda-stero.lua:156: in function 'opfunc' /home/xiao/torch/install/share/lua/5.1/optim/adadelta.lua:29: in function 'optimMethod' 4b-training-adadelta-cuda-stero.lua:228: in function 'train' 4b-training-adadelta-cuda-stero.lua:352: in main chunk [C]: in function 'dofile' main-stanford-bg.lua:89: in main chunk [C]: in function 'dofile' ...xiao/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk [C]: at 0x00406670 ` this is the output from the script. Thank you

hughperkins commented 8 years ago

https://gist.github.com/xiao1228/2f2472e0723a226a5b0b7f35cda22936

Seems to be missing a bunch of lines from the start? eg require 'optim', and probably some other stuff.

hughperkins commented 8 years ago

There's a zillion lines missing from this :-P It also doesnt use OpenCL, as far as I can see?

Can you start by trying the examples at https://github.com/torch/nn/blob/master/doc/training.md#nn.traningneuralnet.dok please? I wrote about ~30-50% of this page by the way, so please let me know anything strange/hard to understand in it, and I'll take a look.

xiao1228 commented 8 years ago

@hughperkins I have a main lua file calling separate files. Therefore, I have all the require stuff there. Thank you

hughperkins commented 8 years ago

Please try to find a much shorter script that demonstrates the problem. It's normally possible to create a 5-10 line script to demonstrate a problem. I dont want to have to read through pages of code ideally :-)

xiao1228 commented 8 years ago

@hughperkins ok sorry about that, thanks

fybaft2012 commented 7 years ago

@abachinskyi Hi bro, I faced the same problem with you, do you find any solution? If so, could you tell me hoe to fix it ? Thanks very much!

hughperkins commented 7 years ago

Hi @fybaft2012 Can you confirm that you have intsalled using https://github.com/hughperkins/distro-cl ? Can you give the full output of the instalation, the simplest lua script you can find to reprodcue the problem ,and the output from running this sirpt please? You could paste these intto https://gist.github.com for example

fybaft2012 commented 7 years ago

Thanks very much for your reply. Yeah I reinstalled, it is working now . Thanks a lot !

On Wed, Aug 10, 2016 at 4:12 PM, Hugh Perkins notifications@github.com wrote:

Hi @fybaft2012 https://github.com/fybaft2012 Can you confirm that you have intsalled using https://github.com/hughperkins/distro-cl ? Can you give the full output of the instalation, the simplest lua script you can find to reprodcue the problem ,and the output from running this sirpt please? You could paste these intto https://gist.github.com for example

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hughperkins/clnn/issues/21#issuecomment-238784359, or mute the thread https://github.com/notifications/unsubscribe-auth/AT9j-I-GqCrVZXUcpRoUGczwIELyMPHAks5qeXnGgaJpZM4G9gCm .

hughperkins commented 7 years ago

Cool :-)