hughperkins / cltorch

An OpenCL backend for torch.
Other
289 stars 26 forks source link

cltorch.test() completes with 2 errors #44

Closed siamak-h closed 8 years ago

siamak-h commented 8 years ago

Running luajit -l cltorch -e 'cltorch.test()' produces the message: Completed 219 asserts in 114 tests with 2 errors on my system. I did not actually try to test previous commits. I used commit 550b4f3, Ubuntu 14.04, with two VGAs and the fglrx driver:

05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cayman LE GL [FirePro V5900]
22:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cayman LE GL [FirePro V5900]

Below is the test output and the output from clinfo. Excuse me for loading the page with all text, I thought it might help.

Thanks, Siamak

running tests...
aftter requiring cltorch.unit_storage
Running 2 tests
|_  ==> test_basic
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Cayman
_|  ==> test_get  
__  ==> Done    
Completed 15 asserts in 2 tests with 0 errors
--------------------------------------------------------------------------------
#tester.errors  0
res true
aftter requiring cltorch.unit_tensor
Running 114 tests
... [some other lines of test progress up to where the first fail I assume occurred]
_____________________|____________________________________________________________________________________________  ==> inplace_sigmoid
_____________________*|___________________________________________________________________________________________  ==> inplace_sign   
_____________________*_|__________________________________________________________________________________________  ==> inplace_sin 
... [the second failure]
_____________________*_________________________________|__________________________________________________________  ==> outplace_sigmoid
_____________________*_________________________________*|_________________________________________________________  ==> outplace_sign   
_____________________*_________________________________*_|________________________________________________________  ==> outplace_sin 
... [and the rest of the tests]

left
     3.0000  5.0000 -2.0000
 2.1000  2.2000  3.9000
[torch.FloatTensor of size 2x3]

right
     3.0000  5.0000 -2.0000
 2.1000  2.4000  3.9000
[torch.FloatTensor of size 2x3]

left
     3.0000  5.0000 -2.0000
 2.1000  2.2000  3.9000
[torch.DoubleTensor of size 2x3]

right
     3.0000  5.0000 -2.0000
 2.1000  2.4000  3.9000
[torch.DoubleTensor of size 2x3]

_____________________*_________________________________*_______________________|__________________________________  ==> test_fills 
_____________________*_________________________________*________________________|_________________________________  ==> test_gather
_____________________*_________________________________*_________________________|________________________________  ==> test_gather_narrowed
new wrapper, size 4
new wrapper, size 4
_____________________*_________________________________*__________________________|_______________________________  ==> test_gather_t       
_____________________*_________________________________*___________________________|______________________________  ==> test_get     
_____________________*_________________________________*____________________________|_____________________________  ==> test_indexcopy
_____________________*_________________________________*_____________________________|____________________________  ==> test_indexfill
_____________________*_________________________________*______________________________|___________________________  ==> test_indexselect
_____________________*_________________________________*_______________________________|__________________________  ==> test_intpower   
_____________________*_________________________________*________________________________|_________________________  ==> test_map     
_____________________*_________________________________*_________________________________|________________________  ==> test_map2
_____________________*_________________________________*__________________________________|_______________________  ==> test_matrixwide
_____________________*_________________________________*___________________________________|______________________  ==> test_max1      
_____________________*_________________________________*____________________________________|_____________________  ==> test_max2
_____________________*_________________________________*_____________________________________|____________________  ==> test_mean
_____________________*_________________________________*______________________________________|___________________  ==> test_meanall
THClReduceAll.cl build log: 
"/tmp/OCL8928T284.cl", line 9: warning: variable "in1" was declared but never
          referenced
    float *in1 = &_in1;
           ^

"/tmp/OCL8928T284.cl", line 10: warning: variable "out" was declared but never
          referenced
    float *out = &_out;
           ^

_____________________*_________________________________*_______________________________________|__________________  ==> test_min1   
_____________________*_________________________________*________________________________________|_________________  ==> test_min2
_____________________*_________________________________*_________________________________________|________________  ==> test_neg 
_____________________*_________________________________*__________________________________________|_______________  ==> test_norm
_____________________*_________________________________*___________________________________________|______________  ==> test_perelement
_____________________*_________________________________*____________________________________________|_____________  ==> test_powerofneg
_____________________*_________________________________*_____________________________________________|____________  ==> test_prod      
_____________________*_________________________________*______________________________________________|___________  ==> test_prodall
THClReduceAll.cl build log: 
"/tmp/OCL8928T330.cl", line 9: warning: variable "in1" was declared but never
          referenced
    float *in1 = &_in1;
           ^

"/tmp/OCL8928T330.cl", line 10: warning: variable "out" was declared but never
          referenced
    float *out = &_out;
           ^

_____________________*_________________________________*_______________________________________________|__________  ==> test_reduceAll
_____________________*_________________________________*________________________________________________|_________  ==> test_reshape  
_____________________*_________________________________*_________________________________________________|________  ==> test_save   
_____________________*_________________________________*__________________________________________________|_______  ==> test_scatter
_____________________*_________________________________*___________________________________________________|______  ==> test_scatterFill
_____________________*_________________________________*____________________________________________________|_____  ==> test_sub        
_____________________*_________________________________*_____________________________________________________|____  ==> test_sum
_____________________*_________________________________*______________________________________________________|___  ==> test_sum_t
_____________________*_________________________________*_______________________________________________________|__  ==> test_sum_t_offset
_____________________*_________________________________*________________________________________________________|_  ==> test_sumall      
_____________________*_________________________________*_________________________________________________________|  ==> test_sumallt
_____________________*_________________________________*__________________________________________________________  ==> Done        

Completed 219 asserts in 114 tests with 2 errors

--------------------------------------------------------------------------------
inplace_sigmoid
 Function call failed 
[string "a:sigmoid()"]:1: attempt to call method 'sigmoid' (a nil value)
stack traceback:
    [string "a:sigmoid()"]:1: in main chunk
    ...adeh/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:244: in function 'v'
    ...adeh/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1031: in function <...adeh/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1029>
    [C]: in function 'xpcall'
    .../shaijzadeh/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    .../shaijzadeh/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    .../shaijzadeh/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    ...adeh/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1041: in function 'test'
    .../shaijzadeh/torch/install/share/lua/5.1/cltorch/Test.lua:15: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

--------------------------------------------------------------------------------
outplace_sigmoid
 Function call failed 
[string "res_cpu = torch.sigmoid(c)"]:1: attempt to call field 'sigmoid' (a nil value)
stack traceback:
    [string "res_cpu = torch.sigmoid(c)"]:1: in main chunk
    ...adeh/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:257: in function 'v'
    ...adeh/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1031: in function <...adeh/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1029>
    [C]: in function 'xpcall'
    .../shaijzadeh/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    .../shaijzadeh/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    .../shaijzadeh/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    ...adeh/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1041: in function 'test'
    .../shaijzadeh/torch/install/share/lua/5.1/cltorch/Test.lua:15: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

--------------------------------------------------------------------------------
luajit: .../shaijzadeh/torch/install/share/lua/5.1/cltorch/Test.lua:16: assertion failed!
stack traceback:
    [C]: in function 'assert'
    .../shaijzadeh/torch/install/share/lua/5.1/cltorch/Test.lua:16: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

clinfo output:

Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.0 AMD-APP (1729.3)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               3
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    ATI FirePro V (FireGL V) Graphics Adapter       
  Device Topology:               PCI[ B#5, D#0, F#0 ]
  Max compute units:                 8
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          4
  Preferred vector width double:         2
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             4
  Native vector width double:            2
  Max clock frequency:               600Mhz
  Address bits:                  32
  Max memory allocation:             536870912
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    None
  Cache line size:               0
  Cache size:                    0
  Global memory size:                1623195648
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Max pipe arguments:                0
  Max pipe active reservations:          0
  Max pipe packet size:              0
  Max global variable size:          0
  Max global variable preferred total size:  0
  Max read/write image args:             0
  Max on device events:              0
  Queue on device max size:          0
  Max on device queues:              0
  Queue on device preferred size:        0
  SVM capabilities:              
    Coarse grain buffer:             No
    Fine grain buffer:               No
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                No
    Profiling :                  No
  Platform ID:                   0x7fce371b78f0
  Name:                      Cayman
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1729.3 (VM)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1729.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event 

  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    
  Device Topology:               PCI[ B#5, D#0, F#0 ]
  Max compute units:                 8
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          4
  Preferred vector width double:         2
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             4
  Native vector width double:            2
  Max clock frequency:               600Mhz
  Address bits:                  32
  Max memory allocation:             536870912
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    None
  Cache line size:               0
  Cache size:                    0
  Global memory size:                2110783488
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Max pipe arguments:                0
  Max pipe active reservations:          0
  Max pipe packet size:              0
  Max global variable size:          0
  Max global variable preferred total size:  0
  Max read/write image args:             0
  Max on device events:              0
  Queue on device max size:          0
  Max on device queues:              0
  Queue on device preferred size:        0
  SVM capabilities:              
    Coarse grain buffer:             No
    Fine grain buffer:               No
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                No
    Profiling :                  No
  Platform ID:                   0x7fce371b78f0
  Name:                      Cayman
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1729.3 (VM)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1729.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event 

  Device Type:                   CL_DEVICE_TYPE_CPU
  Vendor ID:                     1002h
  Board name:                    
  Max compute units:                 16
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               1024
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          8
  Preferred vector width double:         4
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             8
  Native vector width double:            4
  Max clock frequency:               1199Mhz
  Address bits:                  64
  Max memory allocation:             8414596096
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      64
  Max image 2D width:                8192
  Max image 2D height:               8192
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           4096
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    32768
  Global memory size:                33658384384
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Global
  Local memory size:                 32768
  Max pipe arguments:                16
  Max pipe active reservations:          16
  Max pipe packet size:              4119628800
  Max global variable size:          1879048192
  Max global variable preferred total size:  1879048192
  Max read/write image args:             64
  Max on device events:              0
  Queue on device max size:          0
  Max on device queues:              0
  Queue on device preferred size:        0
  SVM capabilities:              
    Coarse grain buffer:             No
    Fine grain buffer:               No
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     1
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             Yes
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                No
    Profiling :                  No
  Platform ID:                   0x7fce371b78f0
  Name:                      Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
  Vendor:                    GenuineIntel
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1729.3 (sse2,avx)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1729.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event 
tigerneil commented 8 years ago

Today I found this excellent rock using opencl, after I installed, the test gave two error messages same as @siamak-h came across

 Function call failed
[string "a:sigmoid()"]:1: attempt to call method 'sigmoid' (a nil value)
stack traceback:
    [string "a:sigmoid()"]:1: in main chunk
    ...aohu/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:244: in function 'v'
    ...aohu/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1031: in function <...aohu/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1029>
    [C]: in function 'xpcall'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    ...aohu/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1041: in function 'test'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/cltorch/Test.lua:15: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x01082e2bc0

--------------------------------------------------------------------------------
outplace_sigmoid
 Function call failed
[string "res_cpu = torch.sigmoid(c)"]:1: attempt to call field 'sigmoid' (a nil value)
stack traceback:
    [string "res_cpu = torch.sigmoid(c)"]:1: in main chunk
    ...aohu/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:257: in function 'v'
    ...aohu/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1031: in function <...aohu/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1029>
    [C]: in function 'xpcall'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    ...aohu/torch/install/share/lua/5.1/cltorch/unit_tensor.lua:1041: in function 'test'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/cltorch/Test.lua:15: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x01082e2bc0```
hughperkins commented 8 years ago

Ok. This is because you need th elatest versoin of both nn and clnn. Please can you run the following, then retry? :

luarocks install nn
luarocks install clnn
hughperkins commented 8 years ago

Sorry, correction, should be:

luarocks install torch
luarocks install cltorch