rapidsai / cuhornet

BSD 3-Clause "New" or "Revised" License
25 stars 26 forks source link

cuda-memcheck failures with batch ops #19

Closed pgera closed 5 years ago

pgera commented 5 years ago

Hitting these sorts of errors in my SCC code, but I am able to reproduce it with kcore as well for a small graph.

Graph:

# Directed graph (each unordered pair of nodes is saved once): Slashdot0811.txt 
# Slashdot Zoo social network from Noveber 6 2008                               
# Nodes: 5 Edges: 5                                                             
# FromNodeId    ToNodeId                                                        
0       2                                                                       
0       3                                                                       
1       0                                                                       
2       1                                                                       
3       4 
cuda-memcheck ./kcore foo.txt 
========= CUDA-MEMCHECK

Graph File: foo                Size: 0 MB        format: (SNAP)

@File    V: 5             E: 5             Structure: Directed     avg. deg: 1.0
@User    V: 5             E: 10            Structure: Undirected   avg. deg: 2.0

   100%
Directed to Undirected: Removing duplicated edges...COO to CSR...   Complete!

ne: 10
========= Invalid __global__ read of size 4
=========     at 0x00000200 in void cub::DeviceScanKernel<cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *, int*, cub::ScanTileState<int, bool=1>, cub::Sum, int, int>(int*, cub::Sum, int, int, int, cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *)
=========     by thread (2,0,0) in block (0,0,0)
=========     Address 0x7fdcc4a04a08 is out of bounds
=========     Device Frame:void cub::DeviceScanKernel<cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *, int*, cub::ScanTileState<int, bool=1>, cub::Sum, int, int>(int*, cub::Sum, int, int, int, cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *) (void cub::DeviceScanKernel<cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *, int*, cub::ScanTileState<int, bool=1>, cub::Sum, int, int>(int*, cub::Sum, int, int, int, cub::DispatchScan<int const *, int*, cub::Sum, int, int>::PtxAgentScanPolicy, int const *) : 0x200)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x24f88d]
=========     Host Frame:./kcore [0x93852]
=========     Host Frame:./kcore [0x93a47]
=========     Host Frame:./kcore [0xc7e05]
=========     Host Frame:./kcore [0x50cd1]
=========     Host Frame:./kcore [0x4a241]
=========     Host Frame:./kcore [0x4ae65]
=========     Host Frame:./kcore [0x4b73a]
=========     Host Frame:./kcore [0x361ec]
=========     Host Frame:./kcore [0x716c]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20830]
=========     Host Frame:./kcore [0x8919]
=========
========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaMemcpyAsync. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x357283]
=========     Host Frame:./kcore [0xcc4e3]
=========     Host Frame:./kcore [0x4a03c]
=========     Host Frame:./kcore [0x4ae65]
=========     Host Frame:./kcore [0x4b73a]
=========     Host Frame:./kcore [0x361ec]
=========     Host Frame:./kcore [0x716c]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20830]
=========     Host Frame:./kcore [0x8919]
=========
========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaStreamSynchronize. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x357283]
=========     Host Frame:./kcore [0xc802e]
=========     Host Frame:./kcore [0x4a049]
=========     Host Frame:./kcore [0x4ae65]
=========     Host Frame:./kcore [0x4b73a]
=========     Host Frame:./kcore [0x361ec]
=========     Host Frame:./kcore [0x716c]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20830]
=========     Host Frame:./kcore [0x8919]
=========
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  trivial_device_copy D->H failed: unspecified launch failure
========= Error: process didn't terminate successfully
========= No CUDA-MEMCHECK results found
pgera commented 5 years ago

Note: Fails only with cuda-memcheck, not on its own. So likely a leak/buffer overflow type bug that is exposed by cuda-memcheck.

pgera commented 5 years ago

Closing as seems to be resolved