hughperkins / distro-cl

OpenCL Torch
147 stars 17 forks source link

Symbol not found: _THClTensor_stdall #27

Open tylerlindell opened 7 years ago

tylerlindell commented 7 years ago

i'm getting the following error when using trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling

dyld: lazy symbol binding failed: Symbol not found: _THClTensor_stdall
  Referenced from: ~/torch-cl/install/lib/lua/5.1/libcltorch.so
  Expected in: flat namespace

dyld: Symbol not found: _THClTensor_stdall
  Referenced from: ~/torch-cl/install/lib/lua/5.1/libcltorch.so
  Expected in: flat namespace

Trace/BPT trap: 5

the code i'm using is here:

--/////////////////////////////////////////////////////////////////////////////
require 'torch'
require 'nn'

--/////////////////////////////////////////////////////////////////////////////
require 'cltorch'
require 'clnn'
-- require 'cunn';

--/////////////////////////////////////////////////////////////////////////////
require 'paths'
if (not paths.filep("cifar10torchsmall.zip")) then
    os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
    os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}

--/////////////////////////////////////////////////////////////////////////////
print(trainset)
print(#trainset.data)

--/////////////////////////////////////////////////////////////////////////////
-- itorch.image(trainset.data[100]) -- display the 100-th image in dataset
print(classes[trainset.label[100]]) 

--/////////////////////////////////////////////////////////////////////////////
-- ignore setmetatable for now, it is a feature beyond the scope of this tutorial. It sets the index operator.
setmetatable(trainset, 
    {__index = function(t, i) 
                    return {t.data[i], t.label[i]} 
                end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.
trainset.data = trainset.data:cl()
trainset.label = trainset.label:cl()
-- trainset.data = trainset.data:cuda()
-- trainset.label = trainset.label:cuda()

function trainset:size() 
    return self.data:size(1) 
end

--/////////////////////////////////////////////////////////////////////////////
print(trainset:size()) -- just to test

--/////////////////////////////////////////////////////////////////////////////
print(trainset[33]) -- load sample number 33.
-- itorch.image(trainset[33][1])

--/////////////////////////////////////////////////////////////////////////////
redChannel = trainset.data[{ {}, {1}, {}, {}  }] -- this picks {all images, 1st channel, all vertical pixels, all horizontal pixels}

--/////////////////////////////////////////////////////////////////////////////
print(#redChannel)

--/////////////////////////////////////////////////////////////////////////////
mean = {} -- store the mean, to normalize the test set in the future
stdv  = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
    mean[i] = trainset.data[{ {}, {i}, {}, {}  }]:mean() -- mean estimation
    print('Channel ' .. i .. ', Mean: ' .. mean[i])
    trainset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction

    stdv[i] = trainset.data[{ {}, {i}, {}, {}  }]:std() -- std estimation
    print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
    trainset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

--/////////////////////////////////////////////////////////////////////////////
net = nn.Sequential()
net = net:cl()
-- net = net:cuda()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))     -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))                    -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120))             -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(120, 84))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(84, 10))                   -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())                     -- converts the output to a log-probability. Useful for classification problems

--/////////////////////////////////////////////////////////////////////////////
criterion = nn.ClassNLLCriterion()
criterion = criterion:cl()
-- criterion = criterion:cuda()

--/////////////////////////////////////////////////////////////////////////////
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.001
trainer.maxIteration = 5 -- just do 5 epochs of training.

--/////////////////////////////////////////////////////////////////////////////
trainer:train(trainset)

--/////////////////////////////////////////////////////////////////////////////
print(classes[testset.label[100]])
-- itorch.image(testset.data[100])

--/////////////////////////////////////////////////////////////////////////////
testset.data = testset.data:double()   -- convert from Byte tensor to Double tensor

for i=1,3 do -- over each image channel
    testset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction    
    testset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

--/////////////////////////////////////////////////////////////////////////////
-- for fun, print the mean and standard-deviation of example-100
horse = testset.data[100]
print(horse:mean(), horse:std())

--/////////////////////////////////////////////////////////////////////////////
print(classes[testset.label[100]])
-- itorch.image(testset.data[100])
predicted = net:forward(testset.data[100])

--/////////////////////////////////////////////////////////////////////////////
-- the output of the network is Log-Probabilities. To convert them to probabilities, you have to take e^x 
print(predicted:exp())

--/////////////////////////////////////////////////////////////////////////////
for i=1,predicted:size(1) do
    print(classes[i], predicted[i])
end

--/////////////////////////////////////////////////////////////////////////////
correct = 0
for i=1,10000 do
    local groundtruth = testset.label[i]
    local prediction = net:forward(testset.data[i])
    local confidences, indices = torch.sort(prediction, true)  -- true means sort in descending order
    if groundtruth == indices[1] then
        correct = correct + 1
    end
end

--/////////////////////////////////////////////////////////////////////////////
print(correct, 100*correct/10000 .. ' % ')

--/////////////////////////////////////////////////////////////////////////////
class_performance = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
for i=1,10000 do
    local groundtruth = testset.label[i]
    local prediction = net:forward(testset.data[i])
    local confidences, indices = torch.sort(prediction, true)  -- true means sort in descending order
    if groundtruth == indices[1] then
        class_performance[groundtruth] = class_performance[groundtruth] + 1
    end
end

--/////////////////////////////////////////////////////////////////////////////
for i=1,#classes do
    print(classes[i], 100*class_performance[i]/1000 .. ' %')
end
hughperkins commented 7 years ago

i dont remember this symbol. might not be implemented. can you grep through the cltorch sourcecode, and see if it exists? (grep -r stdall *)

On 10 April 2017 07:10:51 CEST, TylerLindell notifications@github.com wrote:

i'm getting the following error when using trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling

dyld: lazy symbol binding failed: Symbol not found: _THClTensor_stdall
 Referenced from: ~/torch-cl/install/lib/lua/5.1/libcltorch.so
 Expected in: flat namespace

dyld: Symbol not found: _THClTensor_stdall
 Referenced from: ~/torch-cl/install/lib/lua/5.1/libcltorch.so
 Expected in: flat namespace

Trace/BPT trap: 5

the code i'm using is here:

--/////////////////////////////////////////////////////////////////////////////
require 'torch'
require 'nn'

--/////////////////////////////////////////////////////////////////////////////
require 'cltorch'
require 'clnn'
-- require 'cunn';

--/////////////////////////////////////////////////////////////////////////////
require 'paths'
if (not paths.filep("cifar10torchsmall.zip")) then
os.execute('wget -c
https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
   os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
          'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}

--/////////////////////////////////////////////////////////////////////////////
print(trainset)
print(#trainset.data)

--/////////////////////////////////////////////////////////////////////////////
-- itorch.image(trainset.data[100]) -- display the 100-th image in
dataset
print(classes[trainset.label[100]]) 

--/////////////////////////////////////////////////////////////////////////////
-- ignore setmetatable for now, it is a feature beyond the scope of
this tutorial. It sets the index operator.
setmetatable(trainset, 
   {__index = function(t, i) 
                   return {t.data[i], t.label[i]} 
               end}
);
trainset.data = trainset.data:double() -- convert the data from a
ByteTensor to a DoubleTensor.
trainset.data = trainset.data:cl()
trainset.label = trainset.label:cl()
-- trainset.data = trainset.data:cuda()
-- trainset.label = trainset.label:cuda()

function trainset:size() 
   return self.data:size(1) 
end

--/////////////////////////////////////////////////////////////////////////////
print(trainset:size()) -- just to test

--/////////////////////////////////////////////////////////////////////////////
print(trainset[33]) -- load sample number 33.
-- itorch.image(trainset[33][1])

--/////////////////////////////////////////////////////////////////////////////
redChannel = trainset.data[{ {}, {1}, {}, {}  }] -- this picks {all
images, 1st channel, all vertical pixels, all horizontal pixels}

--/////////////////////////////////////////////////////////////////////////////
print(#redChannel)

--/////////////////////////////////////////////////////////////////////////////
mean = {} -- store the mean, to normalize the test set in the future
stdv  = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
mean[i] = trainset.data[{ {}, {i}, {}, {}  }]:mean() -- mean estimation
   print('Channel ' .. i .. ', Mean: ' .. mean[i])
 trainset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction

 stdv[i] = trainset.data[{ {}, {i}, {}, {}  }]:std() -- std estimation
   print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
   trainset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

--/////////////////////////////////////////////////////////////////////////////
net = nn.Sequential()
net = net:cl()
-- net = net:cuda()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6
output channels, 5x5 convolution kernel
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))     -- A max-pooling operation
that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))                    -- reshapes from a 3D
tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120))             -- fully connected layer
(matrix multiplication between input and weights)
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(120, 84))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(84, 10))                   -- 10 is the number of
outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())                     -- converts the output to
a log-probability. Useful for classification problems

--/////////////////////////////////////////////////////////////////////////////
criterion = nn.ClassNLLCriterion()
criterion = criterion:cl()
-- criterion = criterion:cuda()

--/////////////////////////////////////////////////////////////////////////////
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.001
trainer.maxIteration = 5 -- just do 5 epochs of training.

--/////////////////////////////////////////////////////////////////////////////
trainer:train(trainset)

--/////////////////////////////////////////////////////////////////////////////
print(classes[testset.label[100]])
-- itorch.image(testset.data[100])

--/////////////////////////////////////////////////////////////////////////////
testset.data = testset.data:double()   -- convert from Byte tensor to
Double tensor

for i=1,3 do -- over each image channel
testset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction   

   testset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

--/////////////////////////////////////////////////////////////////////////////
-- for fun, print the mean and standard-deviation of example-100
horse = testset.data[100]
print(horse:mean(), horse:std())

--/////////////////////////////////////////////////////////////////////////////
print(classes[testset.label[100]])
-- itorch.image(testset.data[100])
predicted = net:forward(testset.data[100])

--/////////////////////////////////////////////////////////////////////////////
-- the output of the network is Log-Probabilities. To convert them to
probabilities, you have to take e^x 
print(predicted:exp())

--/////////////////////////////////////////////////////////////////////////////
for i=1,predicted:size(1) do
   print(classes[i], predicted[i])
end

--/////////////////////////////////////////////////////////////////////////////
correct = 0
for i=1,10000 do
   local groundtruth = testset.label[i]
   local prediction = net:forward(testset.data[i])
local confidences, indices = torch.sort(prediction, true)  -- true
means sort in descending order
   if groundtruth == indices[1] then
       correct = correct + 1
   end
end

--/////////////////////////////////////////////////////////////////////////////
print(correct, 100*correct/10000 .. ' % ')

--/////////////////////////////////////////////////////////////////////////////
class_performance = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
for i=1,10000 do
   local groundtruth = testset.label[i]
   local prediction = net:forward(testset.data[i])
local confidences, indices = torch.sort(prediction, true)  -- true
means sort in descending order
   if groundtruth == indices[1] then
   class_performance[groundtruth] = class_performance[groundtruth] + 1
   end
end

--/////////////////////////////////////////////////////////////////////////////
for i=1,#classes do
   print(classes[i], 100*class_performance[i]/1000 .. ' %')
end

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/distro-cl/issues/27

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

tylerlindell commented 7 years ago

Thank you @hughperkins, here is what was returned after running that command

Binary file extra/cutorch/build/CMakeFiles/cutorch.dir/TensorMath.c.o matches
Binary file extra/cutorch/build/lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath2.cu.o matches
Binary file extra/cutorch/build/lib/THC/libTHC.dylib matches
Binary file extra/cutorch/build/libcutorch.so matches
extra/cutorch/build/TensorMath.c:arg2 = THCudaTensor_stdall(default_arg1,arg1);
extra/cutorch/build/TensorMath.c:arg2 = THCudaTensor_stdall(default_arg1,arg1);
extra/cutorch/lib/THC/THCTensorMath.h:THC_API float THCudaTensor_stdall(THCState *state, THCudaTensor *self);
extra/cutorch/lib/THC/THCTensorMath2.cu:float THCudaTensor_stdall(THCState *state, THCudaTensor *self)
install/include/TH/generic/THTensorMath.c:accreal THTensor_(stdall)(THTensor *tensor)
install/include/TH/generic/THTensorMath.h:TH_API accreal THTensor_(stdall)(THTensor *self);
install/include/THC/THCTensorMath.h:THC_API float THCudaTensor_stdall(THCState *state, THCudaTensor *self);
install/include/THCl/THClTensorMath.h:THCL_API float THClTensor_stdall(THClState *state, THClTensor *self);
Binary file install/lib/libTH.dylib matches
Binary file install/lib/libTHC.dylib matches
Binary file install/lib/lua/5.1/libcltorch.so matches
Binary file install/lib/lua/5.1/libcutorch.so matches
Binary file install/lib/lua/5.1/libtorch.so matches
Binary file opencl/cltorch/build/CMakeFiles/cltorch.dir/TensorMath.c.o matches
Binary file opencl/cltorch/build/libcltorch.so matches
opencl/cltorch/build/TensorMath.c:arg2 = THClTensor_stdall(default_arg1,arg1);
opencl/cltorch/build/TensorMath.c:arg2 = THClTensor_stdall(default_arg1,arg1);
opencl/cltorch/src/lib/THClTensorMath.h:THCL_API float THClTensor_stdall(THClState *state, THClTensor *self);
opencl/cltorch/src/lib/THClTensorMath2.cpp:float THClTensor_stdall(THClState *state, THClTensor *self)
Binary file pkg/torch/build/CMakeFiles/torch.dir/TensorMath.c.o matches
Binary file pkg/torch/build/lib/TH/CMakeFiles/TH.dir/THTensor.c.o matches
Binary file pkg/torch/build/lib/TH/libTH.dylib matches
Binary file pkg/torch/build/libtorch.so matches
pkg/torch/build/TensorMath.c:arg2 = THFloatTensor_stdall(arg1);
pkg/torch/build/TensorMath.c:arg2 = THFloatTensor_stdall(arg1);
pkg/torch/build/TensorMath.c:arg2 = THDoubleTensor_stdall(arg1);
pkg/torch/build/TensorMath.c:arg2 = THDoubleTensor_stdall(arg1);
pkg/torch/lib/TH/generic/THTensorMath.c:accreal THTensor_(stdall)(THTensor *tensor)
pkg/torch/lib/TH/generic/THTensorMath.h:TH_API accreal THTensor_(stdall)(THTensor *self);
hughperkins commented 7 years ago

ok, looks like there might be an implemetnation in opencl/cltorch/src/lib/THClTensorMath2.cpp. Can you see if that contains any definitions of THClTensor_stdall? Also, if you run the unit tests, do they work ok for you?

tylerlindell commented 7 years ago

it is commented out in that file but here is an image of all the places it shows up in ~/torch-cl/

screen shot 2017-04-10 at 8 00 13 pm
tylerlindell commented 7 years ago

Here are the test results for unit tests:

$ luajit -l torch -e 'torch.test()'

$ luajit -l torch -e 'torch.test()'
Running 145 tests
  1/145 tanh ............................................................ [PASS]
  2/145 testCholesky .................................................... [PASS]
  3/145 multinomialvector ............................................... [PASS]
  4/145 log ............................................................. [PASS]
  5/145 sigmoid ......................................................... [PASS]
  6/145 permute ......................................................... [PASS]
  7/145 cross ........................................................... [PASS]
  8/145 gels_reuse ...................................................... [PASS]
  9/145 inverse ......................................................... [PASS]
 10/145 rangeequalbounds ................................................ [PASS]
 11/145 min ............................................................. [PASS]
 12/145 prod ............................................................ [PASS]
 13/145 gesv_reuse ...................................................... [PASS]
 14/145 atan ............................................................ [PASS]
 15/145 rangefloat ...................................................... [PASS]
 16/145 isSize .......................................................... [PASS]
 17/145 eig_noncontig ................................................... [PASS]
 18/145 gatherMax ....................................................... [PASS]
 19/145 histc ........................................................... [PASS]
 20/145 pstrf ........................................................... [PASS]
 21/145 kthvalue ........................................................ [PASS]
 22/145 multinomialwithreplacement ...................................... [PASS]
 23/145 maskedCopy ...................................................... [PASS]
 24/145 maskedFill ...................................................... [PASS]
 25/145 pow ............................................................. [PASS]
 26/145 fxcorr3_fxcorr2_eq .............................................. [PASS]
 27/145 isTypeOfInheritance ............................................. [PASS]
 28/145 linspace ........................................................ [PASS]
 29/145 testheaptracking ................................................ [PASS]
 30/145 sum ............................................................. [PASS]
 31/145 gels_uniquely_determined ........................................ [PASS]
 32/145 allAndAny1 ...................................................... [PASS]
 33/145 cmul ............................................................ [PASS]
 34/145 trtrs_reuse ..................................................... [PASS]
 35/145 topK ............................................................ [PASS]
 36/145 newIndex ........................................................ [PASS]
 37/145 exp ............................................................. [PASS]
 38/145 multinomialwithoutreplacement ................................... [PASS]
 39/145 mm .............................................................. [PASS]
 40/145 sortDescending .................................................. [PASS]
 41/145 triu ............................................................ [PASS]
 42/145 repeatTensor .................................................... [PASS]
 43/145 isTensor ........................................................ [PASS]
 44/145 mul ............................................................. [PASS]
 45/145 sqrt ............................................................ [PASS]
 46/145 floor ........................................................... [PASS]
 47/145 elementSize ..................................................... [PASS]
 48/145 rangedouble ..................................................... [PASS]
 49/145 csub ............................................................ [PASS]
 50/145 gesv ............................................................ [PASS]
 51/145 cos ............................................................. [PASS]
 52/145 index ........................................................... [PASS]
 53/145 gels_underdetermined ............................................ [PASS]
 54/145 add ............................................................. [PASS]
 55/145 conv3 ........................................................... [PASS]
 56/145 gels_overdetermined ............................................. [PASS]
 57/145 tril ............................................................ [PASS]
 58/145 maskedSelect .................................................... [PASS]
 59/145 renorm .......................................................... [PASS]
 60/145 eig_reuse ....................................................... [PASS]
 61/145 addbmm .......................................................... [PASS]
 62/145 sin_2 ........................................................... [PASS]
 63/145 symeig_noncontig ................................................ [PASS]
 64/145 clamp ........................................................... [PASS]
 65/145 logical ......................................................... [PASS]
 66/145 cmax ............................................................ [PASS]
 67/145 median .......................................................... [PASS]
 68/145 cosh ............................................................ [PASS]
 69/145 max ............................................................. [PASS]
 70/145 csub_scalar ..................................................... [PASS]
 71/145 xcorr3_xcorr2_eq ................................................ [PASS]
 72/145 scatterFill ..................................................... [PASS]
 73/145 eig ............................................................. [PASS]
 74/145 classNoModule ................................................... [PASS]
 75/145 mod ............................................................. [PASS]
 76/145 bmm ............................................................. [PASS]
 77/145 svd_reuse ....................................................... [PASS]
 78/145 randperm ........................................................ [PASS]
 79/145 classInModule ................................................... [PASS]
 80/145 nonzero ......................................................... [PASS]
 81/145 testBoxMullerState .............................................. [PASS]
 82/145 dot ............................................................. [PASS]
 83/145 allAndAny2 ...................................................... [PASS]
 84/145 trtrs ........................................................... [PASS]
 85/145 storageview ..................................................... [PASS]
 86/145 rand ............................................................ [PASS]
 87/145 zeros ........................................................... [PASS]
 88/145 potrs ........................................................... [PASS]
 89/145 randn ........................................................... [PASS]
 90/145 sinh ............................................................ [PASS]
 91/145 abs ............................................................. [PASS]
 92/145 sortAscending ................................................... [PASS]
 93/145 cinv ............................................................ [PASS]
 94/145 indexCopy ....................................................... [PASS]
 95/145 cpow ............................................................ [PASS]
 96/145 neg ............................................................. [PASS]
 97/145 scatter ......................................................... [PASS]
 98/145 asin ............................................................ [PASS]
 99/145 catArray ........................................................ [PASS]
100/145 RNGStateAliasing ................................................ [PASS]
101/145 ones ............................................................ [PASS]
102/145 div ............................................................. [PASS]
103/145 sin ............................................................. [PASS]
104/145 type ............................................................ [PASS]
105/145 baddbmm ......................................................... [PASS]
106/145 conv2 ........................................................... [PASS]
107/145 mode ............................................................ [PASS]
108/145 svd_noncontig ................................................... [PASS]
109/145 isSameSizeAs .................................................... [PASS]
110/145 ceil ............................................................ [PASS]
111/145 conv3_conv2_eq .................................................. [PASS]
112/145 isTypeOfComposite ............................................... [PASS]
113/145 totable ......................................................... [PASS]
114/145 svd ............................................................. [PASS]
115/145 isStorage ....................................................... [PASS]
116/145 logspace ........................................................ [PASS]
117/145 isTypeOfPartial ................................................. [PASS]
118/145 isSetTo ......................................................... [PASS]
119/145 tan ............................................................. [PASS]
120/145 serialize ....................................................... [PASS]
121/145 RNGState ........................................................ [PASS]
122/145 cumprod ......................................................... [PASS]
123/145 potri ........................................................... [PASS]
124/145 eye ............................................................. [PASS]
125/145 chunk ........................................................... [PASS]
126/145 split ........................................................... [PASS]
127/145 gather .......................................................... [PASS]
128/145 acos ............................................................ [PASS]
129/145 cmin ............................................................ [PASS]
130/145 testNumel ....................................................... [PASS]
131/145 expand .......................................................... [PASS]
132/145 indexAdd ........................................................ [PASS]
133/145 view ............................................................ [PASS]
134/145 reshape ......................................................... [PASS]
135/145 mv .............................................................. [PASS]
136/145 cumsum .......................................................... [PASS]
137/145 diag ............................................................ [PASS]
138/145 cat ............................................................. [PASS]
139/145 round ........................................................... [PASS]
140/145 range ........................................................... [PASS]
141/145 cdiv ............................................................ [PASS]
142/145 fconv3_fconv2_eq ................................................ [PASS]
143/145 test_symeig ..................................................... [PASS]
144/145 cmod ............................................................ [PASS]
145/145 rangenegative ................................................... [PASS]
Completed 1120 asserts in 145 tests with 0 failures and 0 errors

$ luajit -l nn -e 'nn.test()

$ luajit -l nn -e 'nn.test()'
Seed:   1491880542
Running 145 tests
  1/145 VolumetricMaxUnpooling .......................................... [PASS]
  2/145 ConcatTable ..................................................... [PASS]
  3/145 SpatialAveragePooling ........................................... [PASS]
  4/145 Module_getParameters_8 .......................................... [PASS]
  5/145 tostringnnSpatialZeroPadding .................................... [PASS]
  6/145 BCECriterion .................................................... [PASS]
  7/145 ELUIP ........................................................... [PASS]
  8/145 SparseLinear .................................................... [PASS]
  9/145 SpatialCrossMapLRN .............................................. [PASS]
 10/145 VolumetricConvolutionBatchCompare ............................... [PASS]
 11/145 PairwiseDistance ................................................ [PASS]
 12/145 WeightedMSECriterion ............................................ [PASS]
 13/145 SelectTable ..................................................... [PASS]
 14/145 SpatialLPPooling ................................................ [PASS]
 15/145 SpatialDropoutBatch ............................................. [PASS]
 16/145 MixtureTable .................................................... [PASS]
 17/145 SpatialFullConvolutionMap ....................................... [PASS]
 18/145 Module_getParameters_5 .......................................... [PASS]
 19/145 Min ............................................................. [PASS]
 20/145 Exp ............................................................. [PASS]
 21/145 Add ............................................................. [PASS]
 22/145 Module_listModules .............................................. [PASS]
 23/145 SpatialConvolutionLocal ......................................... [PASS]
 24/145 BatchNormalization .............................................. [PASS]
 25/145 MultiCriterion .................................................. [PASS]
 26/145 Module_apply .................................................... [PASS]
 27/145 Max ............................................................. [PASS]
 28/145 MulConstant ..................................................... [PASS]
 29/145 NarrowTable ..................................................... [PASS]
 30/145 View ............................................................ [PASS]
 31/145 VolumetricConvolution ........................................... [PASS]
 32/145 tostringnnReshape ............................................... [PASS]
 33/145 SpatialSubSampling .............................................. [PASS]
 34/145 HardTanh ........................................................ [PASS]
 35/145 DistKLDivCriterion .............................................. [PASS]
 36/145 SplitTable ...................................................... [PASS]
 37/145 DotProduct ...................................................... [PASS]
 38/145 HingeEmbeddingCriterion ......................................... [PASS]
 39/145 SpatialBatchNormalization ....................................... [PASS]
 40/145 DepthConcat ..................................................... [PASS]
 41/145 Sigmoid ......................................................... [PASS]
 42/145 SpatialAdaptiveMaxPooling ....................................... [PASS]
 43/145 Parallel ........................................................ [PASS]
 44/145 SoftShrink ...................................................... [PASS]
 45/145 Module_getParameters_1 .......................................... [PASS]
 46/145 Log ............................................................. [PASS]
 47/145 SpatialDropout .................................................. [PASS]
 48/145 LeakyReLU ....................................................... [PASS]
 49/145 VolumetricMaxPooling ............................................ [PASS]
 50/145 Linear .......................................................... [PASS]
 51/145 Module_getParameters_12 ......................................... [PASS]
 52/145 Euclidean ....................................................... [PASS]
 53/145 SpatialMaxPooling ............................................... [PASS]
 54/145 MultiMarginCriterion ............................................ [PASS]
 55/145 LogSoftmax ...................................................... [PASS]
 56/145 ELU ............................................................. [PASS]
 57/145 Softmax ......................................................... [PASS]
 58/145 LogSigmoid ...................................................... [PASS]
 59/145 Copy ............................................................ [PASS]
 60/145 VolumetricAveragePooling ........................................ [PASS]
 61/145 SpatialContrastiveNormalization ................................. [PASS]
 62/145 Bilinear ........................................................ [PASS]
 63/145 Softmin ......................................................... [PASS]
 64/145 Padding ......................................................... [PASS]
 65/145 Module_getParameters_2 .......................................... [PASS]
 66/145 VolumetricFullConvolution_simple_test ........................... [PASS]
 67/145 MarginRankingCriterion .......................................... [PASS]
 68/145 VolumetricFullConvolution ....................................... [PASS]
 69/145 CrossEntropyCriterion ........................................... [PASS]
 70/145 SpatialSubtractiveNormalization_1dkernel ........................ [PASS]
 71/145 SpatialSoftMax .................................................. [PASS]
 72/145 HardShrink ...................................................... [PASS]
 73/145 SpatialSubSamplingBatchCompare .................................. [PASS]
 74/145 Abs ............................................................. [PASS]
 75/145 Softsign ........................................................ [PASS]
 76/145 WeightedEuclidean ............................................... [PASS]
 77/145 addSingletonDimension ........................................... [PASS]
 78/145 Module_getParameters_10 ......................................... [PASS]
 79/145 L1Cost .......................................................... [PASS]
 80/145 PReLU ........................................................... [PASS]
 81/145 JoinTable ....................................................... [PASS]
 82/145 SpatialFullConvolutionCompare ................................... [PASS]
 83/145 CMul ............................................................ [PASS]
 84/145 CosineDistance .................................................. [PASS]
 85/145 Index ........................................................... [PASS]
 86/145 Mean ............................................................ [PASS]
 87/145 SpatialConvolutionMM ............................................ [PASS]
 88/145 Dropout ......................................................... [PASS]
 89/145 BatchMMTransposeA ............................................... [PASS]
 90/145 SoftPlus ........................................................ [PASS]
 91/145 TemporalConvolution ............................................. [PASS]
 92/145 Module_getParameters_11 ......................................... [PASS]
 93/145 ParallelCriterion ............................................... [PASS]
 94/145 SmoothL1Criterion ............................................... [PASS]
 95/145 L1Penalty ....................................................... [PASS]
 96/145 LookupTable ..................................................... [PASS]
 97/145 SpatialMaxUnpooling ............................................. [PASS]
 98/145 Sqrt ............................................................ [PASS]
 99/145 LeakyReLUIP ..................................................... [PASS]
100/145 Module_getParameters_6 .......................................... [PASS]
101/145 FlattenTable .................................................... [PASS]
102/145 Square .......................................................... [PASS]
103/145 Module_getParameters_4 .......................................... [PASS]
104/145 SpatialDivisiveNormalization_1dkernel ........................... [PASS]
105/145 AddConstant ..................................................... [PASS]
106/145 BatchMMTransposeB ............................................... [PASS]
107/145 BatchMMNoTranspose .............................................. [PASS]
108/145 SpatialConvolutionBatchCompare .................................. [PASS]
109/145 Cosine .......................................................... [PASS]
110/145 Clamp ........................................................... [PASS]
111/145 VolumetricMaxPooling_boundary ................................... [PASS]
112/145 Power ........................................................... [PASS]
113/145 tostringnnLinear ................................................ [PASS]
114/145 TemporalMaxPooling .............................................. [PASS]
115/145 SpatialUpSamplingNearest ........................................ [PASS]
116/145 Sum ............................................................. [PASS]
117/145 Typecast ........................................................ [PASS]
118/145 Tanh ............................................................ [PASS]
119/145 Module_getParameters_3 .......................................... [PASS]
120/145 Threshold ....................................................... [PASS]
121/145 ParallelTable ................................................... [PASS]
122/145 SpatialFractionalMaxPooling_Ratio ............................... [PASS]
123/145 Module_getParameters_7 .......................................... [PASS]
124/145 ClassNLLCriterion ............................................... [PASS]
125/145 Select .......................................................... [PASS]
126/145 BatchMMTransposeBoth ............................................ [PASS]
127/145 SpatialFullConvolutionBatchCompare .............................. [PASS]
128/145 Normalize ....................................................... [PASS]
129/145 SpatialConvolution .............................................. [PASS]
130/145 GradientReversal ................................................ [PASS]
131/145 SpatialConvolutionMap ........................................... [PASS]
132/145 SpatialDivisiveNormalization_2dkernel ........................... [PASS]
133/145 Replicate ....................................................... [PASS]
134/145 CosineEmbeddingCriterion ........................................ [PASS]
135/145 MM .............................................................. [PASS]
136/145 SpatialFullConvolution .......................................... [PASS]
137/145 ReLU ............................................................ [PASS]
138/145 RReLU ........................................................... [PASS]
139/145 Reshape ......................................................... [PASS]
140/145 SpatialSubtractiveNormalization_2dkernel ........................ [PASS]
141/145 MSECriterion .................................................... [PASS]
142/145 MarginCriterion ................................................. [PASS]
143/145 Mul ............................................................. [PASS]
144/145 TemporalSubSampling ............................................. [PASS]
145/145 SpatialFractionalMaxPooling ..................................... [PASS]
Completed 2476 asserts in 145 tests with 0 failures and 0 errors and 1 warning
--------------------------------------------------------------------------------
Should use TestSuite rather than plain lua table

luajit -l cltorch -e 'cltorch.test()'

$ luajit -l cltorch -e 'cltorch.test()'
running tests...
aftter requiring cltorch.unit_storage
Running 2 tests
1/2 test_get ............................................................ [WAIT]
Using Apple , OpenCL platform: Apple
Using OpenCL device: Iris
1/2 test_get ............................................................ [PASS]
2/2 test_basic .......................................................... [WAIT]
2/2 test_basic .......................................................... [PASS]
Completed 15 asserts in 2 tests with 0 failures and 0 errors
#tester.errors  0
res true
aftter requiring cltorch.unit_tensor
Running 117 tests
  1/117 outplace_div .................................................... [WAIT]

  1/117 outplace_div .................................................... [PASS]
  2/117 test_addcmul .................................................... [WAIT]
  2/117 test_addcmul .................................................... [PASS]
  3/117 outplace_tanh ................................................... [WAIT]
  3/117 outplace_tanh ................................................... [PASS]
  4/117 outplace_pow .................................................... [WAIT]

  4/117 outplace_pow .................................................... [PASS]
  5/117 inplace_tanh .................................................... [WAIT]
  5/117 inplace_tanh .................................................... [PASS]
  6/117 test_scatterFill ................................................ [WAIT]
  6/117 test_scatterFill ................................................ [PASS]
  7/117 inplace_acos .................................................... [WAIT]
  7/117 inplace_acos .................................................... [PASS]
  8/117 outplace_cpow ................................................... [WAIT]
  8/117 outplace_cpow ................................................... [PASS]
  9/117 inplace_atan .................................................... [WAIT]
  9/117 inplace_atan .................................................... [PASS]
 10/117 inplace_le ...................................................... [WAIT]
 10/117 inplace_le ...................................................... [PASS]
 11/117 test_equals ..................................................... [WAIT]
 11/117 test_equals ..................................................... [PASS]
 12/117 self_lt ......................................................... [WAIT]

 12/117 self_lt ......................................................... [PASS]
 13/117 inplace_round ................................................... [WAIT]
 13/117 inplace_round ................................................... [PASS]
 14/117 test_matrixwide ................................................. [WAIT]
 14/117 test_matrixwide ................................................. [PASS]
 15/117 inplace_sqrt .................................................... [WAIT]
 15/117 inplace_sqrt .................................................... [PASS]
 16/117 test_max2 ....................................................... [WAIT]
 16/117 test_max2 ....................................................... [PASS]
 17/117 test_prod ....................................................... [WAIT]
 17/117 test_prod ....................................................... [PASS]
 18/117 test_scatter .................................................... [WAIT]
 18/117 test_scatter .................................................... [PASS]
 19/117 inplace_cinv .................................................... [WAIT]
 19/117 inplace_cinv .................................................... [PASS]
 20/117 outplace_sin .................................................... [WAIT]
 20/117 outplace_sin .................................................... [PASS]
 21/117 outplace_ge ..................................................... [WAIT]

 21/117 outplace_ge ..................................................... [PASS]
 22/117 outplace_add .................................................... [WAIT]

 22/117 outplace_add .................................................... [PASS]
 23/117 test_basic ...................................................... [WAIT]
 23/117 test_basic ...................................................... [PASS]
 24/117 test_sub ........................................................ [WAIT]
 24/117 test_sub ........................................................ [PASS]
 25/117 outplace_cdiv ................................................... [WAIT]
 25/117 outplace_cdiv ................................................... [PASS]
 26/117 inplace_log ..................................................... [WAIT]
 26/117 inplace_log ..................................................... [PASS]
 27/117 test_reduceAll .................................................. [WAIT]
THClReduceAll.cl build log: 
<program source>:9:10: warning: unused variable 'in1'
  float *in1 = &_in1;
         ^
<program source>:10:10: warning: unused variable 'out'
  float *out = &_out;
         ^

 27/117 test_reduceAll .................................................. [PASS]
 28/117 inplace_atan2 ................................................... [WAIT]
 28/117 inplace_atan2 ................................................... [PASS]
 29/117 test_intpower ................................................... [WAIT]
 29/117 test_intpower ................................................... [PASS]
 30/117 outplace_mul .................................................... [WAIT]

 30/117 outplace_mul .................................................... [PASS]
 31/117 operator_div_scalar ............................................. [WAIT]

 31/117 operator_div_scalar ............................................. [PASS]
 32/117 test_addcdivshape ............................................... [WAIT]
 32/117 test_addcdivshape ............................................... [PASS]
 33/117 test_min1 ....................................................... [WAIT]
 33/117 test_min1 ....................................................... [PASS]
 34/117 test_norm ....................................................... [WAIT]
 34/117 test_norm ....................................................... [PASS]
 35/117 self_eq ......................................................... [WAIT]

 35/117 self_eq ......................................................... [PASS]
 36/117 operator_plus ................................................... [WAIT]
 36/117 operator_plus ................................................... [PASS]
 37/117 inplace_cos ..................................................... [WAIT]
 37/117 inplace_cos ..................................................... [PASS]
 38/117 outplace_log .................................................... [WAIT]
 38/117 outplace_log .................................................... [PASS]
 39/117 outplace_asin ................................................... [WAIT]
 39/117 outplace_asin ................................................... [PASS]
 40/117 outplace_eq ..................................................... [WAIT]

 40/117 outplace_eq ..................................................... [PASS]
 41/117 outplace_gt ..................................................... [WAIT]

 41/117 outplace_gt ..................................................... [PASS]
 42/117 inplace_exp ..................................................... [WAIT]
 42/117 inplace_exp ..................................................... [PASS]
 43/117 test_gather_t ................................................... [WAIT]
 43/117 test_gather_t ................................................... [PASS]
 44/117 test_apply_on_gpu ............................................... [WAIT]
 44/117 test_apply_on_gpu ............................................... [PASS]
 45/117 operator_sub_scalar ............................................. [WAIT]

 45/117 operator_sub_scalar ............................................. [PASS]
 46/117 inplace_lt ...................................................... [WAIT]
 46/117 inplace_lt ...................................................... [PASS]
 47/117 test_get ........................................................ [WAIT]
 47/117 test_get ........................................................ [PASS]
 48/117 operator_plus_scalar ............................................ [WAIT]

 48/117 operator_plus_scalar ............................................ [PASS]
 49/117 inplace_cdiv .................................................... [WAIT]
 49/117 inplace_cdiv .................................................... [PASS]
 50/117 inplace_sin ..................................................... [WAIT]
 50/117 inplace_sin ..................................................... [PASS]
 51/117 test_sum_t ...................................................... [WAIT]
 51/117 test_sum_t ...................................................... [PASS]
 52/117 test_sumall ..................................................... [WAIT]
 52/117 test_sumall ..................................................... [PASS]
 53/117 test_gather_narrowed ............................................ [WAIT]
new wrapper, size 4
new wrapper, size 4
 53/117 test_gather_narrowed ............................................ [PASS]
 54/117 self_ge ......................................................... [WAIT]

 54/117 self_ge ......................................................... [PASS]
 55/117 operator_mul_scalar ............................................. [WAIT]

 55/117 operator_mul_scalar ............................................. [PASS]
 56/117 outplace_sigmoid ................................................ [WAIT]
 56/117 outplace_sigmoid ................................................ [PASS]
 57/117 test_indexfill .................................................. [WAIT]
 57/117 test_indexfill .................................................. [PASS]
 58/117 outplace_sign ................................................... [WAIT]
 58/117 outplace_sign ................................................... [PASS]
 59/117 test_cumprod .................................................... [WAIT]
 59/117 test_cumprod .................................................... [PASS]
 60/117 test_neg ........................................................ [WAIT]
 60/117 test_neg ........................................................ [PASS]
 61/117 test_mean ....................................................... [WAIT]
 61/117 test_mean ....................................................... [PASS]
 62/117 test_gather ..................................................... [WAIT]
 62/117 test_gather ..................................................... [PASS]
 63/117 test_sum ........................................................ [WAIT]
 63/117 test_sum ........................................................ [PASS]
 64/117 inplace_gt ...................................................... [WAIT]
 64/117 inplace_gt ...................................................... [PASS]
 65/117 test_cmin ....................................................... [WAIT]
 65/117 test_cmin ....................................................... [PASS]
 66/117 test_perelement ................................................. [WAIT]
 66/117 test_perelement ................................................. [PASS]
 67/117 test_min2 ....................................................... [WAIT]
 67/117 test_min2 ....................................................... [PASS]
 68/117 test_max1 ....................................................... [WAIT]
 68/117 test_max1 ....................................................... [PASS]
 69/117 self_ne ......................................................... [WAIT]

 69/117 self_ne ......................................................... [PASS]
 70/117 outplace_cos .................................................... [WAIT]
 70/117 outplace_cos .................................................... [PASS]
 71/117 inplace_ge ...................................................... [WAIT]
 71/117 inplace_ge ...................................................... [PASS]
 72/117 test_indexselect ................................................ [WAIT]
 72/117 test_indexselect ................................................ [PASS]
 73/117 inplace_add ..................................................... [WAIT]
 73/117 inplace_add ..................................................... [PASS]
 74/117 test_reshape .................................................... [WAIT]
 74/117 test_reshape .................................................... [PASS]
 75/117 test_addcdiv .................................................... [WAIT]
 75/117 test_addcdiv .................................................... [PASS]
 76/117 test_cmul ....................................................... [WAIT]
 76/117 test_cmul ....................................................... [PASS]
 77/117 test_fills ...................................................... [WAIT]
 77/117 test_fills ...................................................... [PASS]
 78/117 outplace_acos ................................................... [WAIT]
 78/117 outplace_acos ................................................... [PASS]
 79/117 inplace_floor ................................................... [WAIT]
 79/117 inplace_floor ................................................... [PASS]
 80/117 test_maskedSelect ............................................... [WAIT]
 80/117 test_maskedSelect ............................................... [PASS]
 81/117 test_blas ....................................................... [WAIT]
 81/117 test_blas ....................................................... [PASS]
 82/117 self_gt ......................................................... [WAIT]

 82/117 self_gt ......................................................... [PASS]
 83/117 outplace_ceil ................................................... [WAIT]
 83/117 outplace_ceil ................................................... [PASS]
 84/117 inplace_asin .................................................... [WAIT]
 84/117 inplace_asin .................................................... [PASS]
 85/117 inplace_sign .................................................... [WAIT]
 85/117 inplace_sign .................................................... [PASS]
 86/117 operator_sub .................................................... [WAIT]
 86/117 operator_sub .................................................... [PASS]
 87/117 outplace_abs .................................................... [WAIT]
 87/117 outplace_abs .................................................... [PASS]
 88/117 test_indexcopy .................................................. [WAIT]
 88/117 test_indexcopy .................................................. [PASS]
 89/117 outplace_round .................................................. [WAIT]
 89/117 outplace_round .................................................. [PASS]
 90/117 test_meanall .................................................... [WAIT]
 90/117 test_meanall .................................................... [PASS]
 91/117 test_cumsum ..................................................... [WAIT]
 91/117 test_cumsum ..................................................... [PASS]
 92/117 inplace_abs ..................................................... [WAIT]
 92/117 inplace_abs ..................................................... [PASS]
 93/117 outplace_le ..................................................... [WAIT]

 93/117 outplace_le ..................................................... [PASS]
 94/117 test_clone ...................................................... [WAIT]
 94/117 test_clone ...................................................... [PASS]
 95/117 test_map_on_gpu ................................................. [WAIT]
 95/117 test_map_on_gpu ................................................. [PASS]
 96/117 test_powerofneg ................................................. [WAIT]
 96/117 test_powerofneg ................................................. [PASS]
 97/117 inplace_cpow .................................................... [WAIT]
 97/117 inplace_cpow .................................................... [PASS]
 98/117 outplace_exp .................................................... [WAIT]
 98/117 outplace_exp .................................................... [PASS]
 99/117 outplace_floor .................................................. [WAIT]
 99/117 outplace_floor .................................................. [PASS]
100/117 inplace_eq ...................................................... [WAIT]
100/117 inplace_eq ...................................................... [PASS]
101/117 outplace_sqrt ................................................... [WAIT]
101/117 outplace_sqrt ................................................... [PASS]
102/117 outplace_cinv ................................................... [WAIT]
102/117 outplace_cinv ................................................... [PASS]
103/117 test_sumallt .................................................... [WAIT]
103/117 test_sumallt .................................................... [PASS]
104/117 test_sum_t_offset ............................................... [WAIT]
104/117 test_sum_t_offset ............................................... [PASS]
105/117 test_map2_on_gpu ................................................ [WAIT]
105/117 test_map2_on_gpu ................................................ [PASS]
106/117 inplace_ceil .................................................... [WAIT]
106/117 inplace_ceil .................................................... [PASS]
107/117 outplace_ne ..................................................... [WAIT]

107/117 outplace_ne ..................................................... [PASS]
108/117 test_add ........................................................ [WAIT]
108/117 test_add ........................................................ [PASS]
109/117 test_prodall .................................................... [WAIT]
THClReduceAll.cl build log: 
<program source>:9:10: warning: unused variable 'in1'
  float *in1 = &_in1;
         ^
<program source>:10:10: warning: unused variable 'out'
  float *out = &_out;
         ^

109/117 test_prodall .................................................... [PASS]
110/117 inplace_cmul .................................................... [WAIT]
110/117 inplace_cmul .................................................... [PASS]
111/117 outplace_lt ..................................................... [WAIT]

111/117 outplace_lt ..................................................... [PASS]
112/117 outplace_atan ................................................... [WAIT]
112/117 outplace_atan ................................................... [PASS]
113/117 inplace_ne ...................................................... [WAIT]
113/117 inplace_ne ...................................................... [PASS]
114/117 inplace_sigmoid ................................................. [WAIT]
114/117 inplace_sigmoid ................................................. [PASS]
115/117 self_le ......................................................... [WAIT]

115/117 self_le ......................................................... [PASS]
116/117 outplace_cmul ................................................... [WAIT]
116/117 outplace_cmul ................................................... [PASS]
117/117 test_save ....................................................... [WAIT]
117/117 test_save ....................................................... [PASS]
Completed 233 asserts in 117 tests with 0 failures and 0 errors and 1 warning
--------------------------------------------------------------------------------
Should use TestSuite rather than plain lua table

--------------------------------------------------------------------------------
all tests finished

luajit -l clnn -e 'clnn.test()'

$ luajit -l clnn -e 'clnn.test()'
libthclnn_searchpath    /Users/tylerlindell/torch-cl/install/lib/lua/5.1/libTHCLNN.so
Running 74 tests
 1/74 Square_transposed ................................................. [WAIT]Using Apple , OpenCL platform: Apple
Using OpenCL device: Iris
 1/74 Square_transposed ................................................. [PASS]
 2/74 TemporalConvolution2_forward ...................................... [PASS]
 3/74 SpatialMaxPooling_forward ......................................... [PASS]
 4/74 SoftMax_forward_batch ............................................. [PASS]
 5/74 Sigmoid_forward ................................................... [PASS]
 6/74 ELU_backward ...................................................... [PASS]
 7/74 Threshold_forward ................................................. [PASS]
 8/74 Threshold_backward_inplace ........................................ [PASS]
 9/74 Tanh_transposed ................................................... [PASS]
10/74 SpatialUpSamplingNearest_forward_batch ............................ [WAIT]SpatialUpSamplingNearest.cl build log: 
<program source>:3:20: warning: no previous prototype for function 'translate_idx'
/*__device__*/ int translate_idx(int ii, int d1, int d2, int d3, int scale_factor)
                   ^
<program source>:20:20: warning: no previous prototype for function 'translate_idx_inv'
/*__device__*/ int translate_idx_inv(int ii, int d1, int d2, int d3, int scale_factor, int off_x, int off_y)
                   ^

10/74 SpatialUpSamplingNearest_forward_batch ............................ [PASS]
11/74 Sigmoid_transposed ................................................ [PASS]
12/74 ClassNLLCriterionSingleTarget ..................................... [PASS]
13/74 mse_variablebatchsize ............................................. [PASS]
14/74 LogSigmoid_transposed ............................................. [PASS]
15/74 ClassNLLCriterionMultipleTarget ................................... [WAIT]THClReduceAll.cl build log: 
<program source>:9:10: warning: unused variable 'in1'
  float *in1 = &_in1;
         ^
<program source>:10:10: warning: unused variable 'out'
  float *out = &_out;
         ^

15/74 ClassNLLCriterionMultipleTarget ................................... [PASS]
16/74 SoftMax_forward ................................................... [PASS]
17/74 LogSoftMax_forward ................................................ [PASS]
18/74 Tanh_forward ...................................................... [PASS]
19/74 CMul_forward_batch ................................................ [PASS]
20/74 Threshold_backward ................................................ [PASS]
21/74 mse ............................................................... [WAIT]Apply_3t_0s_0pt_-2_-2_-2_*out = 0.00043487714720591 * (*in1 - *in2) build log: 
<program source>:37:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
    *out = 0.00043487714720591 * (*in1 - *in2);
           ^

21/74 mse ............................................................... [PASS]
22/74 SpatialAveragePooling_backward_batch .............................. [PASS]
23/74 ELU_forward ....................................................... [PASS]
24/74 Square_backward ................................................... [PASS]
25/74 SpatialMaxPooling_forward_batch_ceil .............................. [PASS]
26/74 LogSigmoid_backward ............................................... [PASS]
27/74 SpatialMaxPooling_backward_batch_ceil ............................. [PASS]
28/74 Sqrt_transposed ................................................... [PASS]
29/74 LookupTable_forward ............................................... [PASS]
30/74 ClassNLLCriterionSingleTargetScalar ............................... [PASS]
31/74 SpatialConvolutionMM_forward_single_vgglayer13 .................... [PASS]
32/74 SpatialAveragePooling_backward .................................... [PASS]
33/74 ELU_transposed .................................................... [PASS]
34/74 SpatialConvolutionMM_forward_batch ................................ [PASS]
35/74 Abs_backward ...................................................... [PASS]
36/74 mse_nosizeaverage ................................................. [WAIT]Apply_3t_0s_0pt_-2_-2_-2_*out = 0.00040675208460443 * (*in1 - *in2) build log: 
<program source>:37:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
    *out = 0.00040675208460443 * (*in1 - *in2);
           ^

36/74 mse_nosizeaverage ................................................. [PASS]
37/74 Abs_forward ....................................................... [PASS]
38/74 SpatialConvolutionMM_forward_single_padded ........................ [PASS]
39/74 Threshold_transposed .............................................. [PASS]
40/74 LogSoftMax_backward ............................................... [PASS]
41/74 SpatialMaxPooling_backward_ceil ................................... [PASS]
42/74 Sum_backward ...................................................... [PASS]
43/74 Sqrt_backward ..................................................... [PASS]
44/74 Sum_forward ....................................................... [PASS]
45/74 Sqrt_zero ......................................................... [PASS]
46/74 SpatialConvolutionMM_forward_1d_byhand ............................ [PASS]
47/74 LogSigmoid_forward ................................................ [PASS]
48/74 Tanh_backward ..................................................... [PASS]
49/74 Square_forward .................................................... [PASS]
50/74 SpatialAveragePooling_forward_batch_ceil .......................... [PASS]
51/74 SpatialAveragePooling_backward_batch_ceil ......................... [PASS]
52/74 Abs_transposed .................................................... [PASS]
53/74 SoftMax_backward .................................................. [PASS]
54/74 LogSoftMax_backward_batch ......................................... [PASS]
55/74 SpatialUpSamplingNearest_backward ................................. [WAIT]SpatialUpSamplingNearest.cl build log: 
<program source>:3:20: warning: no previous prototype for function 'translate_idx'
/*__device__*/ int translate_idx(int ii, int d1, int d2, int d3, int scale_factor)
                   ^
<program source>:20:20: warning: no previous prototype for function 'translate_idx_inv'
/*__device__*/ int translate_idx_inv(int ii, int d1, int d2, int d3, int scale_factor, int off_x, int off_y)
                   ^

55/74 SpatialUpSamplingNearest_backward ................................. [PASS]
56/74 SpatialConvolutionMM_backward_single .............................. [PASS]
57/74 SpatialAveragePooling_forward ..................................... [PASS]
58/74 TemporalConvolution2_backward_gradParams .......................... [WAIT]Apply_3t_0s_0pt_-2_-2_-2_*out = 0.0071428571428571 * (*in1 - *in2) build log: 
<program source>:37:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
    *out = 0.0071428571428571 * (*in1 - *in2);
           ^

58/74 TemporalConvolution2_backward_gradParams .......................... [PASS]
59/74 Sigmoid_backward .................................................. [PASS]
60/74 SpatialAveragePooling_backward_ceil ............................... [PASS]
61/74 SpatialMaxPooling_forward_ceil .................................... [PASS]
62/74 Threshold_forward_inplace ......................................... [PASS]
63/74 SpatialMaxPooling_backward_batch .................................. [PASS]
64/74 SoftMax_backward_batch ............................................ [PASS]
65/74 SpatialAveragePooling_forward_batch ............................... [PASS]
66/74 TemporalConvolution2_backward_gradInput ........................... [PASS]
67/74 SpatialMaxPooling_backward ........................................ [PASS]
68/74 SpatialConvolutionMM_backward_batch ............................... [PASS]
69/74 LookupTable_backward .............................................. [WAIT]nDim    97  nInput  10  batch   false   error   0
nDim    97  nInput  10  batch   true    error   0
nDim    97  nInput  101 batch   false   error   0
nDim    97  nInput  101 batch   true    error   0
nDim    255 nInput  10  batch   false   error   0
nDim    255 nInput  10  batch   true    error   0
nDim    255 nInput  101 batch   false   error   0
nDim    255 nInput  101 batch   true    error   0
69/74 LookupTable_backward .............................................. [PASS]
70/74 SpatialAveragePooling_forward_ceil ................................ [PASS]
71/74 SpatialMaxPooling_forward_batch ................................... [PASS]
72/74 Sqrt_forward ...................................................... [PASS]
73/74 SpatialConvolutionMM_forward_single ............................... [PASS]
74/74 LogSoftMax_forward_batch .......................................... [PASS]
Completed 122 asserts in 74 tests with 0 failures and 0 errors
hughperkins commented 7 years ago

yeah, if its commented out, then its not implemented, and someone would need to implement it. the file you took a screenshot of is a header file, with declarations, not the implementation.

On 11 April 2017 05:02:47 CEST, TylerLindell notifications@github.com wrote:

it is commented out in that file but here is an image of all the places it shows up in ~/torch-cl/ <img width="1280" alt="screen shot 2017-04-10 at 8 00 13 pm" src="https://cloud.githubusercontent.com/assets/5748461/24891130/a5e278b6-1e28-11e7-9dc0-69de3bc6b14a.png">

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/distro-cl/issues/27#issuecomment-293136476

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

alex3s commented 7 years ago

Hey Hugh, cltorch is a great piece of software, but does this lack mean that any standard deviation calculation on torch.ClTensor will fail? I currently have the same problem, when running: inputs:std() luajit: symbol lookup error: /home/user/torch-cl/install/lib/lua/5.1/libcltorch.so: undefined symbol: THClTensor_stdall Is there any workaround? - standard deviation for me is fundamental :)

hughperkins commented 7 years ago

Could you copy to the cpu-side, and do standard deviation there?

alex3s commented 7 years ago

I did, then this appeared:

libthclnn_searchpath    /home/alex/torch-cl/install/lib/lua/5.1/libTHCLNN.so    
Using Intel , OpenCL platform: Intel Gen OCL Driver
Using OpenCL device: Intel(R) HD Graphics IvyBridge M GT2

inputs : ClTensor - size: 100x30
targets : ClTensor - size: 100

Apply_3t_0s_0pt_-2_-2_-2_*out = 0.002 * (*in1 - *in2) build log: 
stringInput.cl:37:12: warning: double precision constant requires cl_khr_fp64, casting to single precision

/home/alex/torch-cl/install/bin/luajit: /home/alex/torch-cl/install/share/lua/5.1/nn/Linear.lua:75: invalid arguments: ClTensor number number ClTensor ClTensor 
expected arguments: *ClTensor~2D* [ClTensor~2D] [float] ClTensor~2D ClTensor~2D | *ClTensor~2D* float [ClTensor~2D] float ClTensor~2D ClTensor~2D
stack traceback:
    [C]: in function 'addmm'
    /home/alex/torch-cl/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput'
    /home/alex/torch-cl/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
    /home/alex/torch-cl/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
    test.lua:190: in function 'opfunc'
    /home/alex/torch-cl/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'

When I switch to cpu only everything runs alright, so maybe installing clblas and recompiling torch against libclblas clcblas would work as a workaround?

hughperkins commented 7 years ago

Well, you need to divide your network into one part that is on the gpu and one part that is on the cpu. I forget how to do this. I think there should be some module to handle this?

On 30 July 2017 23:55:02 BST, alex3s notifications@github.com wrote:

I did, then this appeared: `libthclnn_searchpath /home/alex/torch-cl/install/lib/lua/5.1/libTHCLNN.so
Using Intel , OpenCL platform: Intel Gen OCL Driver Using OpenCL device: Intel(R) HD Graphics IvyBridge M GT2

traintargets : ClTensor - size: 100 traininputs : ClTensor - size: 100x30

Apply_3t_0s0pt-2-2-2_out = 0.002 (in1 - in2) build log: stringInput.cl:37:12: warning: double precision constant requires cl_khr_fp64, casting to single precision

/home/alex/torch-cl/install/bin/luajit: /home/alex/torch-cl/install/share/lua/5.1/nn/Linear.lua:75: invalid arguments: ClTensor number number ClTensor ClTensor expected arguments: ClTensor~2D [ClTensor~2D] [float] ClTensor~2D ClTensor~2D | ClTensor~2D float [ClTensor~2D] float ClTensor~2D ClTensor~2D stack traceback: [C]: in function 'addmm' /home/alex/torch-cl/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput' /home/alex/torch-cl/install/share/lua/5.1/nn/Module.lua:30: in function 'backward' /home/alex/torch-cl/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward' test.lua:190: in function 'opfunc' /home/alex/torch-cl/install/share/lua/5.1/optim/adam.lua:33: in function 'adam' ` When I switch to cpu only everything runs alright, so maybe installing clblas and recompiling torch against libclblas clcblas would work as a workaround?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/distro-cl/issues/27#issuecomment-318935783

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

hughperkins commented 7 years ago

(You might need to make your own module that takss a cltensor as input and gives a float tensor as output, and visa versa for backprop)

On 30 July 2017 23:55:02 BST, alex3s notifications@github.com wrote:

I did, then this appeared: `libthclnn_searchpath /home/alex/torch-cl/install/lib/lua/5.1/libTHCLNN.so
Using Intel , OpenCL platform: Intel Gen OCL Driver Using OpenCL device: Intel(R) HD Graphics IvyBridge M GT2

traintargets : ClTensor - size: 100 traininputs : ClTensor - size: 100x30

Apply_3t_0s0pt-2-2-2_out = 0.002 (in1 - in2) build log: stringInput.cl:37:12: warning: double precision constant requires cl_khr_fp64, casting to single precision

/home/alex/torch-cl/install/bin/luajit: /home/alex/torch-cl/install/share/lua/5.1/nn/Linear.lua:75: invalid arguments: ClTensor number number ClTensor ClTensor expected arguments: ClTensor~2D [ClTensor~2D] [float] ClTensor~2D ClTensor~2D | ClTensor~2D float [ClTensor~2D] float ClTensor~2D ClTensor~2D stack traceback: [C]: in function 'addmm' /home/alex/torch-cl/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput' /home/alex/torch-cl/install/share/lua/5.1/nn/Module.lua:30: in function 'backward' /home/alex/torch-cl/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward' test.lua:190: in function 'opfunc' /home/alex/torch-cl/install/share/lua/5.1/optim/adam.lua:33: in function 'adam' ` When I switch to cpu only everything runs alright, so maybe installing clblas and recompiling torch against libclblas clcblas would work as a workaround?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/distro-cl/issues/27#issuecomment-318935783

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

alex3s commented 7 years ago

Thank you. Switching to traininputs:double() nnoutputs:double() in a few places helped indeed. I'm testing the performance now if better than cpu only.

hughperkins commented 7 years ago

cool :-)