Closed pgera closed 5 years ago
Hitting these sorts of errors in my SCC code, but I am able to reproduce it with kcore as well for a small graph.
Graph:
# Directed graph (each unordered pair of nodes is saved once): Slashdot0811.txt # Slashdot Zoo social network from Noveber 6 2008 # Nodes: 5 Edges: 5 # FromNodeId ToNodeId 0 2 0 3 1 0 2 1 3 4
cuda-memcheck ./kcore foo.txt ========= CUDA-MEMCHECK Graph File: foo Size: 0 MB format: (SNAP) @File V: 5 E: 5 Structure: Directed avg. deg: 1.0 @User V: 5 E: 10 Structure: Undirected avg. deg: 2.0 100% Directed to Undirected: Removing duplicated edges...COO to CSR... Complete! ne: 10 ========= Invalid __global__ read of size 4 ========= at 0x00000200 in void cub::DeviceScanKernel<cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *, int*, cub::ScanTileState<int, bool=1>, cub::Sum, int, int>(int*, cub::Sum, int, int, int, cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *) ========= by thread (2,0,0) in block (0,0,0) ========= Address 0x7fdcc4a04a08 is out of bounds ========= Device Frame:void cub::DeviceScanKernel<cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *, int*, cub::ScanTileState<int, bool=1>, cub::Sum, int, int>(int*, cub::Sum, int, int, int, cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *) (void cub::DeviceScanKernel<cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *, int*, cub::ScanTileState<int, bool=1>, cub::Sum, int, int>(int*, cub::Sum, int, int, int, cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *) : 0x200) ========= Saved host backtrace up to driver entry point at kernel launch time ========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x24f88d] ========= Host Frame:./kcore [0x93852] ========= Host Frame:./kcore [0x93a47] ========= Host Frame:./kcore [0xc7e05] ========= Host Frame:./kcore [0x50cd1] ========= Host Frame:./kcore [0x4a241] ========= Host Frame:./kcore [0x4ae65] ========= Host Frame:./kcore [0x4b73a] ========= Host Frame:./kcore [0x361ec] ========= Host Frame:./kcore [0x716c] ========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20830] ========= Host Frame:./kcore [0x8919] ========= ========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaMemcpyAsync. ========= Saved host backtrace up to driver entry point at error ========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x357283] ========= Host Frame:./kcore [0xcc4e3] ========= Host Frame:./kcore [0x4a03c] ========= Host Frame:./kcore [0x4ae65] ========= Host Frame:./kcore [0x4b73a] ========= Host Frame:./kcore [0x361ec] ========= Host Frame:./kcore [0x716c] ========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20830] ========= Host Frame:./kcore [0x8919] ========= ========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaStreamSynchronize. ========= Saved host backtrace up to driver entry point at error ========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x357283] ========= Host Frame:./kcore [0xc802e] ========= Host Frame:./kcore [0x4a049] ========= Host Frame:./kcore [0x4ae65] ========= Host Frame:./kcore [0x4b73a] ========= Host Frame:./kcore [0x361ec] ========= Host Frame:./kcore [0x716c] ========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20830] ========= Host Frame:./kcore [0x8919] ========= terminate called after throwing an instance of 'thrust::system::system_error' what(): trivial_device_copy D->H failed: unspecified launch failure ========= Error: process didn't terminate successfully ========= No CUDA-MEMCHECK results found
Note: Fails only with cuda-memcheck, not on its own. So likely a leak/buffer overflow type bug that is exposed by cuda-memcheck.
Closing as seems to be resolved
Hitting these sorts of errors in my SCC code, but I am able to reproduce it with kcore as well for a small graph.
Graph: