Closed d-meiser closed 10 years ago
Interesting, none of our nightly test machines shows these issues. Could you please provide some informations about the machine? The simplest way of doing so is the paste the output of examples/tutorial/viennacl-info. Thanks, Dominic!
The output of viennacl-info is as follows:
[dmeiser@ivyamd tutorial]$ ./viennacl-info
# =========================================
# Platform Information
# =========================================
#
# Vendor and version: Advanced Micro Devices, Inc.: OpenCL 1.2 AMD-APP (1268.1)
#
# ViennaCL uses this OpenCL platform by default.
#
# Available Devices:
#
-----------------------------------------
Address Bits: 32
Available: 1
Compiler Available: 1
Endian Little: 1
Error Correction Support: 0
Execution Capabilities: CL_EXEC_KERNEL
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer
Global Mem Cache Size: 16384 Bytes
Global Mem Cache Type: CL_READ_WRITE_CACHE
Global Mem Cacheline Size: 64 Bytes
Global Mem Size: 3221225472 Bytes
Host Unified Memory: 0
Image Support: 1
Image2D Max Height: 16384
Image2D Max Width: 16384
Image3D Max Depth: 2048
Image3D Max Height: 2048
Image3D Max Width: 2048
Local Mem Size: 32768 Bytes
Local Mem Type: CL_LOCAL
Max Clock Frequency: 975 MHz
Max Compute Units: 32
Max Constant Args: 8
Max Constant Buffer Size: 65536 Bytes
Max Mem Alloc Size: 1073741824 Bytes
Max Parameter Size: 1024 Bytes
Max Read Image Args: 128
Max Samplers: 16
Max Work Group Size: 256
Max Work Item Dimensions: 3
Max Work Item Sizes: 256 256 256
Max Write Image Args: 8
Mem Base Addr Align: 2048
Min Data Type Align Size: 128 Bytes
Name: Tahiti
Native Vector Width char: 4
Native Vector Width short: 2
Native Vector Width int: 1
Native Vector Width long: 1
Native Vector Width float: 1
Native Vector Width double: 1
Native Vector Width half: 1
OpenCL C Version: OpenCL C 1.2
Platform: 0x7f90d809d540
Preferred Vector Width char: 4
Preferred Vector Width short: 2
Preferred Vector Width int: 1
Preferred Vector Width long: 1
Preferred Vector Width float: 1
Preferred Vector Width double: 1
Preferred Vector Width half: 1
Profile: FULL_PROFILE
Profiling Timer Resolution: 1 ns
Queue Properties: CL_QUEUE_PROFILING_ENABLE
Single FP Config: CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
Type: GPU
Vendor: Advanced Micro Devices, Inc.
Vendor ID: 4098
Version: OpenCL 1.2 AMD-APP (1268.1)
Driver Version: 1268.1 (VM)
-----------------------------------------
-----------------------------------------
Address Bits: 64
Available: 1
Compiler Available: 1
Endian Little: 1
Error Correction Support: 0
Execution Capabilities: CL_EXEC_KERNEL CL_EXEC_NATIVE_KERNEL
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt
Global Mem Cache Size: 32768 Bytes
Global Mem Cache Type: CL_READ_WRITE_CACHE
Global Mem Cacheline Size: 64 Bytes
Global Mem Size: 33607430144 Bytes
Host Unified Memory: 1
Image Support: 1
Image2D Max Height: 8192
Image2D Max Width: 8192
Image3D Max Depth: 2048
Image3D Max Height: 2048
Image3D Max Width: 2048
Local Mem Size: 32768 Bytes
Local Mem Type: CL_GLOBAL
Max Clock Frequency: 3701 MHz
Max Compute Units: 8
Max Constant Args: 8
Max Constant Buffer Size: 65536 Bytes
Max Mem Alloc Size: 8401857536 Bytes
Max Parameter Size: 4096 Bytes
Max Read Image Args: 128
Max Samplers: 16
Max Work Group Size: 1024
Max Work Item Dimensions: 3
Max Work Item Sizes: 1024 1024 1024
Max Write Image Args: 8
Mem Base Addr Align: 1024
Min Data Type Align Size: 128 Bytes
Name: Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz
Native Vector Width char: 16
Native Vector Width short: 8
Native Vector Width int: 4
Native Vector Width long: 2
Native Vector Width float: 8
Native Vector Width double: 4
Native Vector Width half: 4
OpenCL C Version: OpenCL C 1.2
Platform: 0x7f90d809d540
Preferred Vector Width char: 16
Preferred Vector Width short: 8
Preferred Vector Width int: 4
Preferred Vector Width long: 2
Preferred Vector Width float: 8
Preferred Vector Width double: 4
Preferred Vector Width half: 4
Profile: FULL_PROFILE
Profiling Timer Resolution: 1 ns
Queue Properties: CL_QUEUE_PROFILING_ENABLE
Single FP Config: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
Type: CPU
Vendor: GenuineIntel
Vendor ID: 4098
Version: OpenCL 1.2 AMD-APP (1268.1)
Driver Version: 1268.1 (sse2,avx)
-----------------------------------------
###########################################
The OS is (centos 6.3):
[dmeiser@ivyamd tutorial]$ uname -a
Linux ivyamd.txcorp.com 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 18:37:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
And I'm using gcc 4.4.7.
Does the test executable provide a way to control which device the test is being run on? Perhaps an environment variable? I could try to run on the CPU.
Hi !
Thanks for the report :)
Unfortunately we don't provide (yet) any device selection for the tests/examples. Adding console arguments for that is an important TODO. We should also have enough test GPUs to cover the 3 major vendors.
I ran the test on a similar GPU (Hawaii), and the test failed as well. The test fails also for bigger matrices (271 x 271). I don't have a lot of insight on QR Methods, but my insight is that there could be a missing initialization somewhere. The result would also be false for 4 x 4 matrices, but the difference would remain under the test treshold. I don't know a lot about how QR methods are implemented in ViennaCL, but I'll search for any possible missing initialization.
Philippe
2014-04-04 16:42 GMT+02:00 Dominic Meiser notifications@github.com:
Does the test executable provide a way to control which device the test is being run on? Perhaps an environment variable? I could try to run on the CPU.
Reply to this email directly or view it on GitHubhttps://github.com/viennacl/viennacl-dev/issues/70#issuecomment-39572042 .
Hey, there is of course a method for selecting the device as long as you alter the code. See for example here: https://github.com/viennacl/viennacl-dev/blob/master/examples/benchmarks/blas3.cpp#L65 for the use of
viennacl::ocl::set_context_device_type(0, viennacl::ocl::gpu_tag());
For the qr-method test, simply call
viennacl::ocl::set_context_device_type(0, viennacl::ocl::cpu_tag());
right at the beginning of main() to run on the Intel CPU.
On a related note: The qr method implementation has become fairly dated in the meanwhile and should be migrated to a backend-agnostic implementation anyway. Thus, the TODO for this issue is to perform this migration rather than to just 'repair' the OpenCL-only implementation.
For what it's worth when I run the test on the CPU (following Karl's instructions) the test passes.
The QR method has been largely extended and the tests should now run smoothly: https://github.com/viennacl/viennacl-dev/pull/100
I get the following output:
All other tests pass. Let me know if I can provide any other info to help with this. Cheers, Dominic