gunrock / gunrock

Programmable CUDA/C++ GPU Graph Analytics
https://gunrock.github.io/gunrock/
Apache License 2.0
978 stars 200 forks source link

Invalid Device Function error running Louvain Algorithm (possibly) due to cmake settings #560

Open yzhwang opened 5 years ago

yzhwang commented 5 years ago

Summary: Running louvain algorithm with the following cmdline: ./louvain market ./soc_LiveJournal1.mtx --undirected --max-iters=2 --max-passes=3 --quick=true --device=0 will have the following error: [/root/dev/gunrock/gunrock/app/louvain/louvain_enactor.cuh, 281 @ gpu 0] error encountered (CUDA error 8: invalid device function) [/root/dev/gunrock/gunrock/app/enactor_base.cuh, 518 @ gpu 0] error encountered (CUDA error 8: invalid device function) [/root/dev/gunrock/gunrock/app/louvain/louvain_enactor.cuh, 1045 @ gpu 0] error encountered (CUDA error 8: invalid device function) [/root/dev/gunrock/gunrock/app/louvain/louvain_app.cu, 88 @ gpu 0] error encountered (CUDA error 8: invalid device function) [/root/dev/gunrock/examples/louvain/test_louvain.cu, 128 @ gpu 0] error encountered (CUDA error 8: invalid device function) https://github.com/gunrock/gunrock/blob/master/CMakeLists.txt#L266

Reproducibility: Always

To Reproduce: The bug was caught on a clean build on the head of master.

Related Question: When I was trying to look for the solution, I saw this: https://github.com/gunrock/gunrock/blob/master/CMakeLists.txt#L266 We did compile with CUDA 9, so does the above line mean the code for cc 6.1 is turned off? I haven't tried to set it (to ON) yet, since I have no access to the machine, but I will have my colleague do some test with different settings here today and report back.

System Configuration:

GPU: Tesla P40 (cc: 6.1) CPU: N/A OS: CentOS 6 Driver Version: 396.44 CUDA Version: 9 gcc 4.8.5

yzhwang commented 5 years ago

https://github.com/gunrock/gunrock/blob/master/examples/BaseMakefile.mk#L42 Probably should also add sm target for 6.1 there? I will try it first.

neoblizz commented 5 years ago

Yeah, this is an odd one. It didn't come up during my tests on CUDA 9, if you don't discover anything, I will try it out on my end. Btw, BaseMakefile is independent of CMakeLists.txt. So, you may use the 61 in there and do make just for louvain (in the louvain examples directory) and see if you still have the error.

yzhwang commented 5 years ago

Thanks for the reply Mohammad.

My colleague changed the BaseMakefile for louvain and compiled with CUDA 9 + compute capability 6.1 setting (on a P40). There seems to be some bug at the releasing phase: + ./louvain market ./soc_LiveJournal1.mtx --undirected --max-iters=2 --max-passes=3 --quick=true --device=0 [/root/dev/gunrock/gunrock/oprtr/LB_advance/kernel.cuh, 511 @ gpu 0] cudaStreamSynchronize failed (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_enactor.cuh, 204 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/enactor_base.cuh, 518 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_enactor.cuh, 1045 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_app.cu, 88 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/util/array_utils.cuh, 520 @ gpu 0] current_communities cudaFree failed (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_problem.cuh, 201 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/util/array_utils.cuh, 520 @ gpu 0] data_slices[0] cudaFree failed (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_problem.cuh, 354 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/examples/louvain/test_louvain.cu, 128 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) /\ top of the stack is the lowest one here ___|

Any idea? As I cannot get access to the actual machine, I cannot debug this, which I'd be happy to do if I had the access to that machine. What's your suggestion here? Turning on the ENABLE_ARRAY_DEBUG? How do we do that if we want to?

neoblizz commented 5 years ago

@yzhwang Let's see the output of ENABLE_ARRAY_DEBUG:

https://github.com/gunrock/gunrock/blob/master/gunrock/util/array_utils.cuh#L30

yzhwang commented 5 years ago

Thank you! Loop in my colleague WJ (@sdj222555), who is actually trying this. He will post his findings later.

sdj222555 commented 5 years ago

Thanks very much! When I turned on the ENABLE_ARRAY_DEBUG. I found that:

./louvain market ./soc_LiveJournal1.mtx --undirected --max-iters=2 --max-passes=3 --quick=true [/root/dev/gunrock/gunrock/util/array_utils.cuh, 964 @ gpu 0] out_counter not deined on HOST (CUDA error 16: invalid host pointer) [/root/dev/gunrock/gunrock/util/array_utils.cuh, 964 @ gpu 0] in_counter not deined on HOST (CUDA error 16: invalid host pointer) [/root/dev/gunrock/gunrock/oprtr/LB_advance/kernel.cuh, 511 @ gpu 0] cudaStreamSynchronize failed (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_enactor.cuh, 204 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/enactor_base.cuh, 518 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_enactor.cuh, 1045 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_app.cu, 88 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/util/array_utils.cuh, 520 @ gpu 0] current_communities cudaFree failed (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_problem.cuh, 201 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/util/array_utils.cuh, 520 @ gpu 0] data_slices[0] cudaFree failed (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/gunrock/app/louvain/louvain_problem.cuh, 354 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered) [/root/dev/gunrock/examples/louvain/test_louvain.cu, 128 @ gpu 0] error encountered (CUDA error 77: an illegal memory access was encountered)

and this is the log :

Loading Matrix-market coordinate-formatted graph ...
  Reading from ./soc_LiveJournal1.mtx:
  Parsing MARKET COO format (4847571 nodes, 68993773 directed edges)... edge_pairs   allocated on HOST, length =     68993773, size =    551950184 bytes, pointer =  0x7ff63b4f3010
edge_values  allocated on HOST, length =     68993773, size =    551950184 bytes, pointer =  0x7ff61a691010
  Done (26 s).
  Writing meta data into ./soc_LiveJournal1.mtx.meta
  Writting edge pairs in binary into ./soc_LiveJournal1.mtx.coo_edge_pairs
  Assigning 1 to all 68993773 edges
  Substracting 1 from node Ids...
  Edge doubleing: 68993773 -> 137987546 edges
edge_pairs EnsureSize : 68993773 -> 137987546
t_array  allocated on HOST, length =     137987546, size =   1103900368 bytes, pointer =     0x7ff5ad978010
edge_pairs   released on HOST, length =  68993773, pointer =     0x7ff63b4f3010
t_array  unsetted on HOST
edge_values EnsureSize : 68993773 -> 137987546
t_array  allocated on HOST, length =     137987546, size =   1103900368 bytes, pointer =     0x7ff56bcb5010
edge_values  released on HOST, length =  68993773, pointer =     0x7ff61a691010
t_array  unsetted on HOST
Array1D::Sort::temp_array    allocated on HOST, length =     137987546, size =   2207800736 bytes, pointer =     0x7ff4e832f010
omp_sort::table3     allocated on HOST, length =     137987546, size =   2207800736 bytes, pointer =     0x7ff4649a9010
omp_sort::retvals    allocated on HOST, length =     87, size =  348 bytes, pointer =    0x7ff6580008c0
omp_sort::table2     allocated on HOST, length =     30276, size =   484416 bytes, pointer =     0x7ff661f8e010
opm_sort::pivots     allocated on HOST, length =     87, size =  1392 bytes, pointer =   0x7ff658000a30
omp_sort::pivot_pos  allocated on HOST, length =     7569, size =    30276 bytes, pointer =  0x7ff658000fb0
omp_sort::table2     released on HOST, length =  30276, pointer =    0x7ff661f8e010
opm_sort::pivots     released on HOST, length =  87, pointer =   0x7ff658000a30
omp_sort::table3     released on HOST, length =  137987546, pointer =    0x7ff4649a9010
omp_sort::retvals    released on HOST, length =  87, pointer =   0x7ff6580008c0
omp_sort::pivot_pos  released on HOST, length =  7569, pointer =     0x7ff658000fb0
Array1D::Sort::temp_array    released on HOST, length =  137987546, pointer =    0x7ff4e832f010
     allocated on HOST, length =     137987546, size =   1103900368 bytes, pointer =     0x7ff529ff2010
edge_pairs   released on HOST, length =  137987546, pointer =    0x7ff5ad978010
edge_pairs   allocated on HOST, length =     85702474, size =    685619792 bytes, pointer =  0x7ff5c685f010
     allocated on HOST, length =     137987546, size =   1103900368 bytes, pointer =     0x7ff4e832f010
edge_values  released on HOST, length =  137987546, pointer =    0x7ff56bcb5010
edge_values  allocated on HOST, length =     85702474, size =    685619792 bytes, pointer =  0x7ff59da83010
     released on HOST, length =  137987546, pointer =    0x7ff529ff2010
     released on HOST, length =  137987546, pointer =    0x7ff4e832f010
  Removed 52285072 duplicate edges and self circles.
  graph loaded as COO in 37.985458s.
Converting 4847571 vertices, 85702474 directed edges ( ordered tuples) to CSR format...row_offsets   allocated on HOST, length =     4847572, size =     19390288 bytes, pointer =   0x2ae8af0
column_indices   allocated on HOST, length =     85702474, size =    342809896 bytes, pointer =  0x7ff589395010
edge_values  allocated on HOST, length =     85702474, size =    685619792 bytes, pointer =  0x7ff5605b9010
Done (0s).
edge_pairs   released on HOST, length =  85702474, pointer =     0x7ff5c685f010
edge_values  released on HOST, length =  85702474, pointer =     0x7ff59da83010
     allocated on HOST, length =     33, size =  132 bytes, pointer =    0x2ad7280
Degree Histogram (4847571 vertices, 85702474 edges):
    Degree 0: 962 (0.019845 %)
    Degree 2^0: 1057993 (21.825219 %)
    Degree 2^1: 919014 (18.958237 %)
    Degree 2^2: 833607 (17.196386 %)
    Degree 2^3: 742607 (15.319157 %)
    Degree 2^4: 616847 (12.724868 %)
    Degree 2^5: 404231 (8.338836 %)
    Degree 2^6: 184810 (3.812425 %)
    Degree 2^7: 62310 (1.285386 %)
    Degree 2^8: 19429 (0.400799 %)
    Degree 2^9: 4855 (0.100153 %)
    Degree 2^10: 704 (0.014523 %)
    Degree 2^11: 161 (0.003321 %)
    Degree 2^12: 29 (0.000598 %)
    Degree 2^13: 11 (0.000227 %)
    Degree 2^14: 1 (0.000021 %)
     released on HOST, length =  33, pointer =   0x2ad7280

==============================================
64bit-VertexT=false 64bit-SizeT=false 64bit-ValueT=true undirected=true unify-segments=0 advance-mode=LB omp-threads=0 1st-th=0.0001 neighborcomm-th=-1

#threads = 87, 1st-th = 0.000100
row_offsets  allocated on HOST, length =     782236, size =  3128944 bytes, pointer =    0x7ff61e44e460
column_indices   allocated on HOST, length =     11955095, size =    47820380 bytes, pointer =   0x7ff455265010
edge_values  allocated on HOST, length =     11955095, size =    95640760 bytes, pointer =   0x7ff0164ca010
row_offsets  allocated on HOST, length =     42775, size =   171100 bytes, pointer =     0x7ff2718b1720
column_indices   allocated on HOST, length =     1702764, size =     6811056 bytes, pointer =    0x7ff2718db390
edge_values  allocated on HOST, length =     1702764, size =     13622112 bytes, pointer =   0x7ff271f5a150
row_offsets  released on HOST, length =  782236, pointer =   0x7ff61e44e460
column_indices   released on HOST, length =  11955095, pointer =     0x7ff455265010
edge_values  released on HOST, length =  11955095, pointer =     0x7ff0164ca010
row_offsets  allocated on HOST, length =     4446, size =    17784 bytes, pointer =  0x7ff23a1518e0
column_indices   allocated on HOST, length =     99287, size =   397148 bytes, pointer =     0x7ff23a169920
edge_values  allocated on HOST, length =     99287, size =   794296 bytes, pointer =     0x7ff23a1ca890
row_offsets  released on HOST, length =  42775, pointer =    0x7ff2718b1720
column_indices   released on HOST, length =  1702764, pointer =  0x7ff2718db390
edge_values  released on HOST, length =  1702764, pointer =  0x7ff271f5a150
--------------------------
Run 0 elapsed: 4230.287109 ms, q = 0.739184
Community Validity: PASS
Computed: #communities = 4445.000000, modularity = 0.739184
sub_graphs   setted on HOST, size =  1, pointer =    0x7ffc98fa8980, setted =    HOST
data_slices[0]   allocated on HOST, length =     1, size =   2240 bytes, pointer =   0x6e88b68
data_slices[0]   allocated on DEVICE, length =   1, size =   2240 bytes, pointer =   0x7ff5cf400000
current_communities  allocated on DEVICE, length =   4847571, size =     19390284 bytes, pointer =   0x7ff5c4000000
next_communities     allocated on DEVICE, length =   4847571, size =     19390284 bytes, pointer =   0x7ff5c2000000
community_sizes  allocated on DEVICE, length =   4847571, size =     19390284 bytes, pointer =   0x7ff5c0000000
w_v2     allocated on DEVICE, length =   4847571, size =     38780568 bytes, pointer =   0x7ff5bc000000
w_v2self     allocated on DEVICE, length =   4847571, size =     38780568 bytes, pointer =   0x7ff5b8000000
w_c2     allocated on DEVICE, length =   4847571, size =     38780568 bytes, pointer =   0x7ff5b4000000
edge_weights0    allocated on DEVICE, length =   85702475, size =    685619800 bytes, pointer =  0x7ff536000000
edge_weights1    allocated on DEVICE, length =   85702475, size =    685619800 bytes, pointer =  0x7ff50c000000
seg_offsets0     allocated on DEVICE, length =   85702475, size =    342809900 bytes, pointer =  0x7ff4f6000000
seg_offsets1     allocated on DEVICE, length =   85702475, size =    342809900 bytes, pointer =  0x7ff4e0000000
edge_pairs0  allocated on DEVICE, length =   85702475, size =    685619800 bytes, pointer =  0x7ff4b6000000
edge_pairs1  allocated on DEVICE, length =   85702475, size =    685619800 bytes, pointer =  0x7ff48c000000
num_neighbor_comms   allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e8bb00
num_neighbor_comms   allocated on DEVICE, length =   1, size =   4 bytes, pointer =  0x7ff4b4e00000
num_new_comms    allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e8acb0
num_new_comms    allocated on DEVICE, length =   1, size =   4 bytes, pointer =  0x7ff4b4e00200
num_new_edges    allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e8acd0
num_new_edges    allocated on DEVICE, length =   1, size =   4 bytes, pointer =  0x7ff4b4e00400
cub_temp_space   allocated on DEVICE, length =   1, size =   1 bytes, pointer =  0x7ff4b4e00600
gain_bases   allocated on DEVICE, length =   4847571, size =     38780568 bytes, pointer =   0x7ff5b0000000
max_gains    allocated on DEVICE, length =   4847571, size =     38780568 bytes, pointer =   0x7ff5ac000000
iter_gain    allocated on HOST, length =     1, size =   8 bytes, pointer =  0x6e8bb70
iter_gain    allocated on DEVICE, length =   1, size =   8 bytes, pointer =  0x7ff4b4e00800
row_offsets  allocated on DEVICE, length =   4847572, size =     19390288 bytes, pointer =   0x7ff5ae600000
row_offsets Moving from HOST to DEVICE, size = 4847572, offset = 0, stream = 115910384, d_pointer = 0x7ff5ae600000, h_pointer = 0x2ae8af0
column_indices   allocated on DEVICE, length =   85702474, size =    342809896 bytes, pointer =  0x7ff476000000
column_indices Moving from HOST to DEVICE, size = 85702474, offset = 0, stream = 115910384, d_pointer = 0x7ff476000000, h_pointer = 0x7ff589395010
edge_values  allocated on DEVICE, length =   85702474, size =    685619792 bytes, pointer =  0x7ff206000000
edge_values Moving from HOST to DEVICE, size = 85702474, offset = 0, stream = 115910384, d_pointer = 0x7ff206000000, h_pointer = 0x7ff5605b9010
Using advance mode LB
Using filter mode CULL
cuda_props   allocated on HOST, length =     1, size =   680 bytes, pointer =    0x6e04de0
enactor_slices   allocated on HOST, length =     1, size =   1072 bytes, pointer =   0x6e05098
     allocated on HOST, length =     1, size =   1536 bytes, pointer =   0x6e05558
thread_slices    allocated on HOST, length =     1, size =   48 bytes, pointer =     0x6e05ef8
thread_Ids   allocated on HOST, length =     1, size =   8 bytes, pointer =  0x6e05f90
iter_full_queue_time     allocated on HOST, length =     1, size =   24 bytes, pointer =     0x6e002b8
iter_sub_queue_time  allocated on HOST, length =     1, size =   24 bytes, pointer =     0x6e86d48
iter_total_time  allocated on HOST, length =     1, size =   24 bytes, pointer =     0x6e89998
iter_full_queue_nodes_queued     allocated on HOST, length =     1, size =   24 bytes, pointer =     0x6e05ec8
iter_full_queue_edges_queued     allocated on HOST, length =     1, size =   24 bytes, pointer =     0x6e05fb8
nodes_queued     allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e09b90
nodes_queued     allocated on DEVICE, length =   1, size =   4 bytes, pointer =  0x7ff4b4e00a00
edges_queued     allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e09c70
edges_queued     allocated on DEVICE, length =   1, size =   4 bytes, pointer =  0x7ff4b4e00c00
Louvain::frontier[0,0]::queue_types  allocated on HOST, length =     0, size =   0 bytes, pointer =  (nil)
Louvain::frontier[0,0]::queue_map    allocated on HOST, length =     0, size =   0 bytes, pointer =  (nil)
Louvain::frontier[0,0]::num_segments     allocated on HOST, length =     0, size =   0 bytes, pointer =  (nil)
Louvain::frontier[0,0]::num_segments     allocated on DEVICE, length =   0, size =   0 bytes, pointer =  (nil)
Louvain::frontier[0,0]::output_length    allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e0a710
Louvain::frontier[0,0]::output_length    allocated on DEVICE, length =   1, size =   4 bytes, pointer =  0x7ff4b4e00e00
Louvain::frontier[0,0]::queue_offsets    allocated on HOST, length =     0, size =   0 bytes, pointer =  (nil)
Louvain::frontier[0,0]::queue_offsets    allocated on DEVICE, length =   0, size =   0 bytes, pointer =  (nil)
counters     allocated on HOST, length =     7, size =   28 bytes, pointer =     0x6e0a890
counters     allocated on DEVICE, length =   7, size =   28 bytes, pointer =     0x7ff4b4e01000
counters Moving from HOST to DEVICE, size = 7, offset = 0, stream = 0, d_pointer = 0x7ff4b4e01000, h_pointer = 0x6e0a890
in_length[0]     allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e02e10
in_length[1]     allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e05510
in_length_out    allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e05530
in_length_out    allocated on DEVICE, length =   1, size =   4 bytes, pointer =  0x7ff4b4e01200
in_iteration[0]  allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e0a8c0
in_iteration[0]  allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e0a8e0
out_length   allocated on HOST, length =     1, size =   4 bytes, pointer =  0x6e02db0
out_length   allocated on DEVICE, length =   1, size =   4 bytes, pointer =  0x7ff4b4e01400
vertex_associate_orgs    allocated on HOST, length =     0, size =   0 bytes, pointer =  (nil)
vertex_associate_orgs    allocated on DEVICE, length =   0, size =   0 bytes, pointer =  (nil)
value__associate_orgs    allocated on HOST, length =     1, size =   8 bytes, pointer =  0x6e02dd0
value__associate_orgs    allocated on DEVICE, length =   1, size =   8 bytes, pointer =  0x7ff4b4e01600
latency_data     allocated on HOST, length =     122880, size =  491520 bytes, pointer =     0x79917b0
latency_data     allocated on DEVICE, length =   122880, size =  491520 bytes, pointer =     0x7ff48a6ede00
latency_data Moving from HOST to DEVICE, size = 122880, offset = 0, stream = 0, d_pointer = 0x7ff48a6ede00, h_pointer = 0x79917b0
wait_marker  allocated on HOST, length =     2, size =   8 bytes, pointer =  0x6e0e670
stages   allocated on HOST, length =     2, size =   8 bytes, pointer =  0x6e0cc00
to_show  allocated on HOST, length =     2, size =   2 bytes, pointer =  0x6e0cc20
events[]     allocated on HOST, length =     2, size =   16 bytes, pointer =     0x6e0e760
events_set[]     allocated on HOST, length =     2, size =   16 bytes, pointer =     0x6e0e780
events[]     allocated on HOST, length =     2, size =   16 bytes, pointer =     0x6e0f1f0
events_set[]     allocated on HOST, length =     2, size =   16 bytes, pointer =     0x6e0f3b0
events[]     allocated on HOST, length =     2, size =   16 bytes, pointer =     0x6e0ff50
events_set[]     allocated on HOST, length =     2, size =   16 bytes, pointer =     0x6e0ffd0
events[]     allocated on HOST, length =     2, size =   16 bytes, pointer =     0x6e10a60
events_set[]     allocated on HOST, length =     2, size =   16 bytes, pointer =     0x6e10c20
Louvain::frontier[0,0]::output_offsets   allocated on DEVICE, length =   0, size =   0 bytes, pointer =  (nil)
     allocated on DEVICE, length =   2048, size =    8192 bytes, pointer =   0x7ff22f000000
     allocated on DEVICE, length =   2048, size =    8192 bytes, pointer =   0x7ff22f002000
data_slices[0] Moving from HOST to DEVICE, size = 1, offset = 0, stream = 0, d_pointer = 0x7ff5cf400000, h_pointer = 0x6e88b68
nodes_queued Moving from HOST to DEVICE, size = 1, offset = 0, stream = 0, d_pointer = 0x7ff4b4e00a00, h_pointer = 0x6e09b90
edges_queued Moving from HOST to DEVICE, size = 1, offset = 0, stream = 0, d_pointer = 0x7ff4b4e00c00, h_pointer = 0x6e09c70
counters Moving from HOST to DEVICE, size = 7, offset = 0, stream = 0, d_pointer = 0x7ff4b4e01000, h_pointer = 0x6e0a890
__________________________
-1   -1  -1  scanned_edges   oversize :  0 ->    3
Louvain::frontier[0,0]::output_offsets   releasing on DEVICE, length =   0, pointer = (nil)
Louvain::frontier[0,0]::output_offsets   allocated on DEVICE, length =   3, size =   12 bytes, pointer =     0x7ff4b4e01800
Louvain::frontier[0,0]::output_length Moving from DEVICE to HOST, size = 1, offset = 0, stream = 115368240, d_pointer = 0x7ff4b4e00e00, h_pointer = 0x6e0a710
Louvain::frontier[0,0]::output_length Moving from DEVICE to HOST, size = 1, offset = 0, stream = 115368240, d_pointer = 0x7ff4b4e00e00, h_pointer = 0x6e0a710
data_slices[0]   released on HOST, length =  1, pointer =    0x6e88b68
current_communities  releasing on DEVICE, length =   4847571, pointer = 0x7ff5c4000000
data_slices[0]   releasing on DEVICE, length =   1, pointer = 0x7ff5cf400000
##########User command end.

It seems like the in_counter and out_counter are null pointers. maybe used in somewhere after release.

neoblizz commented 5 years ago

@sdj222555 did you build just the louvain application or the whole library? Can I get the output if your cmake or make? My cmake looks like this:

-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda-10.1 (found suitable version "10.1", minimum required is "7.5")
-- Boost version: 1.65.1
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   system
--   filesystem
--   timer
--   chrono
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found OpenMP
-- Found rapidjson header files in /home/mosama/forks/main/gunrock/externals/rapidjson/include
-- Project Added: bc
-- Project Added: bfs
-- Project Added: color
-- Project Added: geo
-- Project Added: hits
-- Project Added: knn
-- Project Added: louvain
-- Project Added: pr
-- Project Added: pr_nibble
-- Project Added: proj
-- Project Added: rw
-- Project Added: sage
-- Project Added: sm
-- Project Added: ss
-- Project Added: sssp
-- Project Added: tc
-- Project Added: vn
-- Configuring done
-- Generating done
-- Build files have been written to: /home/mosama/forks/main/gunrock/build
neoblizz commented 5 years ago

I was able to reproduce this on our end, as well as @Laurawly is running into the same issue on SM. The bug is when using --advance-mode=LB or any other mode except --advance-mode=ALL_EDGES, so, the command that should work for now is:

louvain market /path/to/dataset/soc-LiveJournal1.mtx --undirected --max-iters=2 --max-passes=3 --quick=true --advance-mode=ALL_EDGES

I think it may be because of the update of moderngpu, I will look more into it when I get some time. @yzhwang @sdj222555

neoblizz commented 5 years ago
CUDA Exception: Warp Illegal Address

Thread 1 "louvain_main_10" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 4, block (160,0,0), thread (448,0,0), device 0, sm 0, warp 46, lane 0]
0x000000000b49d860 in gunrock::oprtr::GetEdgeCounts<gunrock::graph::Csr<unsigned int, unsigned int, double, 272u, 0u, true>, unsigned int><<<(320,1,1),(512,1,1)>>> (graph=..., keys_in=0x0, num_elements=4847571,
    edge_counts=0x7ffe72e01800, flag=1) at /home/mosama/forks/laurawly/gunrock/examples/louvain/../../gunrock/oprtr/advance/advance_base.cuh:163

seems to be the issue.

sdj222555 commented 5 years ago

I was able to reproduce this on our end, as well as @Laurawly is running into the same issue on SM. The bug is when using --advance-mode=LB or any other mode except --advance-mode=ALL_EDGES, so, the command that should work for now is:

louvain market /path/to/dataset/soc-LiveJournal1.mtx --undirected --max-iters=2 --max-passes=3 --quick=true --advance-mode=ALL_EDGES

I think it may be because of the update of moderngpu, I will look more into it when I get some time. @yzhwang @sdj222555

Hi, Mohammad , Thank you very much! After trying the "--advance-mode=ALL_EDGES", it has run successfully. I think your are right. I remember that I can run louvain successfully in "pre-release" branch of gunrock about a month ago. At that time everything is ok. Maybe something has changed in moderngpu.

neoblizz commented 5 years ago

I was able to reproduce this on our end, as well as @Laurawly is running into the same issue on SM. The bug is when using --advance-mode=LB or any other mode except --advance-mode=ALL_EDGES, so, the command that should work for now is:

louvain market /path/to/dataset/soc-LiveJournal1.mtx --undirected --max-iters=2 --max-passes=3 --quick=true --advance-mode=ALL_EDGES

I think it may be because of the update of moderngpu, I will look more into it when I get some time. @yzhwang @sdj222555

Hi, Mohammad , Thank you very much! After trying the "--advance-mode=ALL_EDGES", it has run successfully. I think your are right. I remember that I can run louvain successfully in "pre-release" branch of gunrock about a month ago. At that time everything is ok. Maybe something has changed in moderngpu.

Yeah, I added support for CUDA 10.0 in morderngpu, I am not sure what it would break, I will have to investigate it a bit more.

neoblizz commented 5 years ago

I tried this with the old dev-refactor branch still in my fork: https://github.com/neoblizz/gunrock/tree/dev-refactor, and the issue still exists with advance-mode anything but ALL_EDGES. I think it will be helpful if @sgpyc can comment on memory usage for this app, and the issue in general. Since he wrote this.