Closed Rombur closed 9 years ago
You are right, thank you! Fixed in https://github.com/viennacl/viennacl-dev/commit/50f9d227a502b56d439e8d95a7f2e2a135f3d8d8
No problem. When you have time, could you look at the amg tutorial with CUDA. I now get this error:
-- CG with AMG preconditioner, AG COARSENING (PMIS), AG INTERPOLATION --
I ran the tutorial yesterday before and after pushing the fix for all three backends, but no problems showed up. Does the example work for the CPU backend?
Oh ok that's strange. The CPU backend works fine. I will reclone the repository and try again.
Meanwhile I've run it through cuda-memcheck, without any memory access complaints. Which CUDA arch do you use (default is sm_20)?
I have tried with sm_35 and sm_20 but I got the same results. I tried to use a debugger but I'm not sure how to pass the debugging flag to nvcc. So I don't get a backtrace:
CUDA Exception: Device Illegal Address The exception was triggered in device 0.
Program received signal CUDA_EXCEPTION_10, Device Illegal Address. [Switching focus to CUDA kernel 0, grid 2133, block (103,0,0), thread (103,0,0), device 0, sm 0, warp 29, lane 7] 0x0000000002a1d450 in void viennacl::linalg::cuda::compressed_matrix_gemm_stage3<8u, unsigned int, double>(unsigned int const, unsigned int const, double const, unsigned int, unsigned int const, unsigned int const, double const, unsigned int, unsigned int const, unsigned int_, double*)<<<(256,1,1),(128,1,1)>>> ()
Here is the kind of message, I get when I run cuda-memcheck: ========= Invalid global write of size 4 ========= at 0x00000890 in void viennacl::linalg::cuda::compressed_matrix_gemm_stage3<unsigned int=8, unsigned int, double>(unsigned int const , unsigned int const , double const *, viennacl::linalg::cuda::compressed_matrix_gemm_stage_3<unsigned int=8, unsigned int, double>, unsigned int const , unsigned int const , double const , viennacl::linalg::cuda::compressed_matrix_gemm_stage_3<unsigned int=8, unsigned int, double>, unsigned int const , viennacl::linalg::cuda::compressed_matrix_gemm_stage3<unsigned int=8, unsigned int, double>, unsigned int const **) ========= by thread (110,0,0) in block (103,0,0) ========= Address 0x2054bad5c is out of bounds ========= Saved host backtrace up to driver entry point at kernel launch time ========= Host Frame:/usr/lib64/nvidia/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x150d6d] ========= Host Frame:/software/easybuild/software/CUDA/7.0.28/lib64/libcudart.so.7.0 [0x131b0] ========= Host Frame:/software/easybuild/software/CUDA/7.0.28/lib64/libcudart.so.7.0 (cudaLaunch + 0x143) [0x2d653] ========= Host Frame:./amg-cuda [0x47d5e] ========= Host Frame:./amg-cuda [0x36f76] ========= Host Frame:./amg-cuda [0x37002] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg4cuda30compressed_matrix_gemm_stage_3ILj8EjdEEvPKT0_S5_PKT1_S3_S5_S5_S8_S3_S5_PS3PS6 + 0x5c) [0x4cc12] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg4cuda9prod_implIdLj1EEEvRKNS_17compressed_matrixIT_XT0_EEES7RS5 + 0x153f) [0x6e2d9] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg9prod_implIdEEvRKNS_17compressed_matrixIT_Lj1EEES6RS4 + 0x14d) [0x660ed] ========= Host Frame:./amg-cuda (_ZN8viennacl17compressed_matrixIdLj1EEaSERKNS_17matrix_expressionIKS1_S3_NS_7op_prodEEE + 0xd3) [0x5f279] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg6detail17amg_galerkin_prodIdEEvRNS_17compressed_matrixIT_Lj1EEES6_S6S6 + 0x9a) [0x5904e] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg6detail9amg_setupIdSt6vectorINS1_3amg17amg_level_contextESaIS5_EEEEmRS3_INS_17compressed_matrixIT_Lj1EEESaISA_EESD_SD_RT0_RNS0_7amg_tagE + 0x35d) [0x54ea3] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg11amg_precondINS_17compressed_matrixIdLj1EEEE5setupEv + 0x3f) [0x518c5] ========= Host Frame:./amg-cuda (_Z7run_amgIdEvRN8viennacl6linalg6cg_tagERNS0_6vectorIT_Lj1EEES7_RNS0_17compressed_matrixIS5_Lj1EEESsRNS1_7amg_tagE + 0xd7) [0x4c085] ========= Host Frame:./amg-cuda (main + 0x5ad) [0x32f1a] ========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed5d]
Hmm, this is weird. Which driver version do you use (maybe you can provide the output of viennacl-info)? Which operating system? Is this the default matrix in the amg-cuda example, or do you pass your own matrix?
This is the output of nvidia-smi (I think viennacl-infor requires OpenCL which I don't have)
+------------------------------------------------------+
| NVIDIA-SMI 346.59 Driver Version: 346.59 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m On | 0000:20:00.0 Off | 0 |
| N/A 20C P8 16W / 225W | 13MiB / 4799MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m On | 0000:8B:00.0 Off | 0 |
| N/A 18C P8 16W / 225W | 13MiB / 4799MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
The OS is centos 6.6. I'm using the default matrix but I get the same error if I pass my own.
Thanks, that helped. I can now reproduce the problem on a remote machine also equipped with a K20m. It's still strange that everything passes on a bunch of different desktop GPUs and CUDA versions, though.
It looks like this is a bug in CUDA, which we have observed for OpenCL before. I've pushed a workaround: https://github.com/viennacl/viennacl-dev/commit/2a6e9153834a0f2071ca5f4bb198a075d8db749d
Yes, it works. Thanks.
well, thank you! :-)
I think that there is a bug in viennacl/linalg/cuda/amg_operations.hpp at line 476. It should be AMG_COARSENING_METHOD_MIS2_AGGREGATION instead of AMG_COARSENING_METHOD_AGGREGATION. Because of that the amg tutorial crash when using CUDA.