viennacl / viennacl-dev

Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.
Other
281 stars 89 forks source link

bug in amg for CUDA #147

Closed Rombur closed 9 years ago

Rombur commented 9 years ago

I think that there is a bug in viennacl/linalg/cuda/amg_operations.hpp at line 476. It should be AMG_COARSENING_METHOD_MIS2_AGGREGATION instead of AMG_COARSENING_METHOD_AGGREGATION. Because of that the amg tutorial crash when using CUDA.

karlrupp commented 9 years ago

You are right, thank you! Fixed in https://github.com/viennacl/viennacl-dev/commit/50f9d227a502b56d439e8d95a7f2e2a135f3d8d8

Rombur commented 9 years ago

No problem. When you have time, could you look at the amg tutorial with CUDA. I now get this error:

-- CG with AMG preconditioner, AG COARSENING (PMIS), AG INTERPOLATION --

karlrupp commented 9 years ago

I ran the tutorial yesterday before and after pushing the fix for all three backends, but no problems showed up. Does the example work for the CPU backend?

Rombur commented 9 years ago

Oh ok that's strange. The CPU backend works fine. I will reclone the repository and try again.

karlrupp commented 9 years ago

Meanwhile I've run it through cuda-memcheck, without any memory access complaints. Which CUDA arch do you use (default is sm_20)?

Rombur commented 9 years ago

I have tried with sm_35 and sm_20 but I got the same results. I tried to use a debugger but I'm not sure how to pass the debugging flag to nvcc. So I don't get a backtrace:

CUDA Exception: Device Illegal Address The exception was triggered in device 0.

Program received signal CUDA_EXCEPTION_10, Device Illegal Address. [Switching focus to CUDA kernel 0, grid 2133, block (103,0,0), thread (103,0,0), device 0, sm 0, warp 29, lane 7] 0x0000000002a1d450 in void viennacl::linalg::cuda::compressed_matrix_gemm_stage3<8u, unsigned int, double>(unsigned int const, unsigned int const, double const, unsigned int, unsigned int const, unsigned int const, double const, unsigned int, unsigned int const, unsigned int_, double*)<<<(256,1,1),(128,1,1)>>> ()

Rombur commented 9 years ago

Here is the kind of message, I get when I run cuda-memcheck: ========= Invalid global write of size 4 ========= at 0x00000890 in void viennacl::linalg::cuda::compressed_matrix_gemm_stage3<unsigned int=8, unsigned int, double>(unsigned int const , unsigned int const , double const *, viennacl::linalg::cuda::compressed_matrix_gemm_stage_3<unsigned int=8, unsigned int, double>, unsigned int const , unsigned int const , double const , viennacl::linalg::cuda::compressed_matrix_gemm_stage_3<unsigned int=8, unsigned int, double>, unsigned int const , viennacl::linalg::cuda::compressed_matrix_gemm_stage3<unsigned int=8, unsigned int, double>, unsigned int const **) ========= by thread (110,0,0) in block (103,0,0) ========= Address 0x2054bad5c is out of bounds ========= Saved host backtrace up to driver entry point at kernel launch time ========= Host Frame:/usr/lib64/nvidia/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x150d6d] ========= Host Frame:/software/easybuild/software/CUDA/7.0.28/lib64/libcudart.so.7.0 [0x131b0] ========= Host Frame:/software/easybuild/software/CUDA/7.0.28/lib64/libcudart.so.7.0 (cudaLaunch + 0x143) [0x2d653] ========= Host Frame:./amg-cuda [0x47d5e] ========= Host Frame:./amg-cuda [0x36f76] ========= Host Frame:./amg-cuda [0x37002] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg4cuda30compressed_matrix_gemm_stage_3ILj8EjdEEvPKT0_S5_PKT1_S3_S5_S5_S8_S3_S5_PS3PS6 + 0x5c) [0x4cc12] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg4cuda9prod_implIdLj1EEEvRKNS_17compressed_matrixIT_XT0_EEES7RS5 + 0x153f) [0x6e2d9] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg9prod_implIdEEvRKNS_17compressed_matrixIT_Lj1EEES6RS4 + 0x14d) [0x660ed] ========= Host Frame:./amg-cuda (_ZN8viennacl17compressed_matrixIdLj1EEaSERKNS_17matrix_expressionIKS1_S3_NS_7op_prodEEE + 0xd3) [0x5f279] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg6detail17amg_galerkin_prodIdEEvRNS_17compressed_matrixIT_Lj1EEES6_S6S6 + 0x9a) [0x5904e] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg6detail9amg_setupIdSt6vectorINS1_3amg17amg_level_contextESaIS5_EEEEmRS3_INS_17compressed_matrixIT_Lj1EEESaISA_EESD_SD_RT0_RNS0_7amg_tagE + 0x35d) [0x54ea3] ========= Host Frame:./amg-cuda (_ZN8viennacl6linalg11amg_precondINS_17compressed_matrixIdLj1EEEE5setupEv + 0x3f) [0x518c5] ========= Host Frame:./amg-cuda (_Z7run_amgIdEvRN8viennacl6linalg6cg_tagERNS0_6vectorIT_Lj1EEES7_RNS0_17compressed_matrixIS5_Lj1EEESsRNS1_7amg_tagE + 0xd7) [0x4c085] ========= Host Frame:./amg-cuda (main + 0x5ad) [0x32f1a] ========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed5d]

========= Host Frame:./amg-cuda [0x32819]

karlrupp commented 9 years ago

Hmm, this is weird. Which driver version do you use (maybe you can provide the output of viennacl-info)? Which operating system? Is this the default matrix in the amg-cuda example, or do you pass your own matrix?

Rombur commented 9 years ago

This is the output of nvidia-smi (I think viennacl-infor requires OpenCL which I don't have) +------------------------------------------------------+
| NVIDIA-SMI 346.59 Driver Version: 346.59 |
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K20m On | 0000:20:00.0 Off | 0 | | N/A 20C P8 16W / 225W | 13MiB / 4799MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K20m On | 0000:8B:00.0 Off | 0 | | N/A 18C P8 16W / 225W | 13MiB / 4799MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

The OS is centos 6.6. I'm using the default matrix but I get the same error if I pass my own.

karlrupp commented 9 years ago

Thanks, that helped. I can now reproduce the problem on a remote machine also equipped with a K20m. It's still strange that everything passes on a bunch of different desktop GPUs and CUDA versions, though.

karlrupp commented 9 years ago

It looks like this is a bug in CUDA, which we have observed for OpenCL before. I've pushed a workaround: https://github.com/viennacl/viennacl-dev/commit/2a6e9153834a0f2071ca5f4bb198a075d8db749d

Rombur commented 9 years ago

Yes, it works. Thanks.

karlrupp commented 9 years ago

well, thank you! :-)