viennacl / viennacl-dev

Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.
Other
283 stars 90 forks source link

qr_method-test-opencl failing #70

Closed d-meiser closed 10 years ago

d-meiser commented 10 years ago

I get the following output:

[dmeiser@ivyamd tests]$ ./qr_method-test-opencl 
Reading...
Testing matrix of size 4-by-4
Calculation...
Verification...
[[OK]] [4x4]  ../examples/testdata/eigen/nsm1.example time = 0.4352
tridiagonal = 1, hessenberg = 1 prod-diff = 0.000001 eigen-diff = 0.000000
Reading...
Testing matrix of size 10-by-10
Calculation...
Verification...
[FAIL] [10x10]  ../examples/testdata/eigen/nsm2.example time = 0.0100
tridiagonal = 1, hessenberg = 1 prod-diff = 1.659658 eigen-diff = 0.000000

All other tests pass. Let me know if I can provide any other info to help with this. Cheers, Dominic

karlrupp commented 10 years ago

Interesting, none of our nightly test machines shows these issues. Could you please provide some informations about the machine? The simplest way of doing so is the paste the output of examples/tutorial/viennacl-info. Thanks, Dominic!

d-meiser commented 10 years ago

The output of viennacl-info is as follows:

[dmeiser@ivyamd tutorial]$ ./viennacl-info 
# =========================================
#         Platform Information             
# =========================================
#
# Vendor and version: Advanced Micro Devices, Inc.: OpenCL 1.2 AMD-APP (1268.1)
#
# ViennaCL uses this OpenCL platform by default.
# 
# Available Devices: 
# 

  -----------------------------------------
Address Bits:                  32
Available:                     1
Compiler Available:            1
Endian Little:                 1
Error Correction Support:      0
Execution Capabilities:        CL_EXEC_KERNEL 
Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer 
Global Mem Cache Size:         16384 Bytes
Global Mem Cache Type:         CL_READ_WRITE_CACHE 
Global Mem Cacheline Size:     64 Bytes
Global Mem Size:               3221225472 Bytes
Host Unified Memory:           0
Image Support:                 1
Image2D Max Height:            16384
Image2D Max Width:             16384
Image3D Max Depth:             2048
Image3D Max Height:            2048
Image3D Max Width:             2048
Local Mem Size:                32768 Bytes
Local Mem Type:                CL_LOCAL 
Max Clock Frequency:           975 MHz
Max Compute Units:             32
Max Constant Args:             8
Max Constant Buffer Size:      65536 Bytes
Max Mem Alloc Size:            1073741824 Bytes
Max Parameter Size:            1024 Bytes
Max Read Image Args:           128
Max Samplers:                  16
Max Work Group Size:           256
Max Work Item Dimensions:      3
Max Work Item Sizes:           256 256 256 
Max Write Image Args:          8
Mem Base Addr Align:           2048
Min Data Type Align Size:      128 Bytes
Name:                          Tahiti
Native Vector Width char:      4
Native Vector Width short:     2
Native Vector Width int:       1
Native Vector Width long:      1
Native Vector Width float:     1
Native Vector Width double:    1
Native Vector Width half:      1
OpenCL C Version:              OpenCL C 1.2 
Platform:                      0x7f90d809d540
Preferred Vector Width char:   4
Preferred Vector Width short:  2
Preferred Vector Width int:    1
Preferred Vector Width long:   1
Preferred Vector Width float:  1
Preferred Vector Width double: 1
Preferred Vector Width half:   1
Profile:                       FULL_PROFILE
Profiling Timer Resolution:    1 ns
Queue Properties:              CL_QUEUE_PROFILING_ENABLE 
Single FP Config:              CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA 
Type:                          GPU 
Vendor:                        Advanced Micro Devices, Inc.
Vendor ID:                     4098
Version:                       OpenCL 1.2 AMD-APP (1268.1)
Driver Version:                1268.1 (VM)
  -----------------------------------------

  -----------------------------------------
Address Bits:                  64
Available:                     1
Compiler Available:            1
Endian Little:                 1
Error Correction Support:      0
Execution Capabilities:        CL_EXEC_KERNEL CL_EXEC_NATIVE_KERNEL 
Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt 
Global Mem Cache Size:         32768 Bytes
Global Mem Cache Type:         CL_READ_WRITE_CACHE 
Global Mem Cacheline Size:     64 Bytes
Global Mem Size:               33607430144 Bytes
Host Unified Memory:           1
Image Support:                 1
Image2D Max Height:            8192
Image2D Max Width:             8192
Image3D Max Depth:             2048
Image3D Max Height:            2048
Image3D Max Width:             2048
Local Mem Size:                32768 Bytes
Local Mem Type:                CL_GLOBAL 
Max Clock Frequency:           3701 MHz
Max Compute Units:             8
Max Constant Args:             8
Max Constant Buffer Size:      65536 Bytes
Max Mem Alloc Size:            8401857536 Bytes
Max Parameter Size:            4096 Bytes
Max Read Image Args:           128
Max Samplers:                  16
Max Work Group Size:           1024
Max Work Item Dimensions:      3
Max Work Item Sizes:           1024 1024 1024 
Max Write Image Args:          8
Mem Base Addr Align:           1024
Min Data Type Align Size:      128 Bytes
Name:                          Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz
Native Vector Width char:      16
Native Vector Width short:     8
Native Vector Width int:       4
Native Vector Width long:      2
Native Vector Width float:     8
Native Vector Width double:    4
Native Vector Width half:      4
OpenCL C Version:              OpenCL C 1.2 
Platform:                      0x7f90d809d540
Preferred Vector Width char:   16
Preferred Vector Width short:  8
Preferred Vector Width int:    4
Preferred Vector Width long:   2
Preferred Vector Width float:  8
Preferred Vector Width double: 4
Preferred Vector Width half:   4
Profile:                       FULL_PROFILE
Profiling Timer Resolution:    1 ns
Queue Properties:              CL_QUEUE_PROFILING_ENABLE 
Single FP Config:              CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA 
Type:                          CPU 
Vendor:                        GenuineIntel
Vendor ID:                     4098
Version:                       OpenCL 1.2 AMD-APP (1268.1)
Driver Version:                1268.1 (sse2,avx)
  -----------------------------------------

###########################################

The OS is (centos 6.3):

[dmeiser@ivyamd tutorial]$ uname -a
Linux ivyamd.txcorp.com 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 18:37:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

And I'm using gcc 4.4.7.

d-meiser commented 10 years ago

Does the test executable provide a way to control which device the test is being run on? Perhaps an environment variable? I could try to run on the CPU.

ptillet commented 10 years ago

Hi !

Thanks for the report :)

Unfortunately we don't provide (yet) any device selection for the tests/examples. Adding console arguments for that is an important TODO. We should also have enough test GPUs to cover the 3 major vendors.

I ran the test on a similar GPU (Hawaii), and the test failed as well. The test fails also for bigger matrices (271 x 271). I don't have a lot of insight on QR Methods, but my insight is that there could be a missing initialization somewhere. The result would also be false for 4 x 4 matrices, but the difference would remain under the test treshold. I don't know a lot about how QR methods are implemented in ViennaCL, but I'll search for any possible missing initialization.

Philippe

2014-04-04 16:42 GMT+02:00 Dominic Meiser notifications@github.com:

Does the test executable provide a way to control which device the test is being run on? Perhaps an environment variable? I could try to run on the CPU.

Reply to this email directly or view it on GitHubhttps://github.com/viennacl/viennacl-dev/issues/70#issuecomment-39572042 .

karlrupp commented 10 years ago

Hey, there is of course a method for selecting the device as long as you alter the code. See for example here: https://github.com/viennacl/viennacl-dev/blob/master/examples/benchmarks/blas3.cpp#L65 for the use of

viennacl::ocl::set_context_device_type(0, viennacl::ocl::gpu_tag());

For the qr-method test, simply call

viennacl::ocl::set_context_device_type(0, viennacl::ocl::cpu_tag());

right at the beginning of main() to run on the Intel CPU.

On a related note: The qr method implementation has become fairly dated in the meanwhile and should be migrated to a backend-agnostic implementation anyway. Thus, the TODO for this issue is to perform this migration rather than to just 'repair' the OpenCL-only implementation.

d-meiser commented 10 years ago

For what it's worth when I run the test on the CPU (following Karl's instructions) the test passes.

karlrupp commented 10 years ago

The QR method has been largely extended and the tests should now run smoothly: https://github.com/viennacl/viennacl-dev/pull/100