The aggregation repository contains a set of algorithms for grouping vertices of DAGs coming from loop-carried dependencies. For more information see Sympiler website
The SpTrSv_LL_Tree_HDAGG case in SpTRSV_runtime.cpp example leads to error for certain input matrices:
Running LL HDAGG with BIN Code with #core: 4 - The runtime:7.7951e-05
core: 4; bin: 1
Compute DAG
terminate called after throwing an instance of 'std::length_error'
what(): cannot create std::vector larger than max_size()
Aborted (core dumped)
This happens, for example, with input matrix A = random_square_sparse(200, 0.5, 1.0, 2U);, and with Metis reordering disabled. Depending on the random parameters, some matrices fail while others run fine.
When error occurs, the variable state is cur_level == n == 2, then -1 is returned. Moreover, the code in line 45~64 is placed after return so will never get executed, so I guess some logic must be wrong here. The assert at line 42 should probably be enabled?
gdb ./build_debug/example/Hdagg_SpTRSV
(check state of `n` before entering loop)
break src/hdagg/hdagg.cpp:28
(check state of `cur_level` before returning -1)
break src/hdagg/hdagg.cpp:44
(the first break point is inside `SpTrSv_LL_HDAGG` case, not yet to `SpTrSv_LL_Tree_HDAGG`, so no bug)
run
print n
(n equals 200 for this input)
(the second break point is inside the buggy `SpTrSv_LL_Tree_HDAGG`)
c
print n
(n equals 2 because ngroups is used as first argument of HDAGG() function)
(the third break point is just before returning -1)
print cur_level
(cur_level equals 2, so -1 is returned as result of build_levelSet_CSC() function)
c
(crashes when creating a vector of size -1)
Bug description
The
SpTrSv_LL_Tree_HDAGG
case inSpTRSV_runtime.cpp
example leads to error for certain input matrices:This happens, for example, with input matrix
A = random_square_sparse(200, 0.5, 1.0, 2U);
, and with Metis reordering disabled. Depending on the random parameters, some matrices fail while others run fine.Locating bug
The error occurs inside the
HDAGG
function, at the following call with invalid vector sizenlevels = -1
: https://github.com/sympiler/aggregation/blob/da293bb1d1060bc390ad785978f8452943a8909c/src/hdagg/hdagg.cpp#L663The invalid value
-1
is returned by thebuild_levelSet_CSC
function at line 44: https://github.com/sympiler/aggregation/blob/da293bb1d1060bc390ad785978f8452943a8909c/src/hdagg/hdagg.cpp#L41-L64When error occurs, the variable state is
cur_level == n == 2
, then-1
is returned. Moreover, the code in line 45~64 is placed afterreturn
so will never get executed, so I guess some logic must be wrong here. Theassert
at line 42 should probably be enabled?Steps to reproduce
Modify
SpTRSV_runtime.cpp
with the following input matrix:I can also send a draft PR to show such bug, and a parameterized unit test to sweep over various input matrices (update: see #10).
Then, build and run:
Deeper look with GDB: