sympiler / aggregation

The aggregation repository contains a set of algorithms for grouping vertices of DAGs coming from loop-carried dependencies. For more information see Sympiler website
http://www.sympiler.com/
MIT License
5 stars 5 forks source link

Assertion error inside HDAGG::partialSparsification for Hdagg_SpTRSV example #5

Open learning-chip opened 1 year ago

learning-chip commented 1 year ago

At the latest commit da104fa343672469f2000efa4285dccb32107370, and fixed #4, there is still run-time error for examples/SpTRSV_runtime.cpp. The error occurs during the call to HDAGG::partialSparsification in examples/SpTRSV_runtime.h, in particular this assertion inside partialSparsification:

https://github.com/sympiler/aggregation/blob/da104fa343672469f2000efa4285dccb32107370/src/hdagg/hdagg.cpp#L2083-L2086

Error messages

With debug build, the error message is:

METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:1.25e-05
Running LL Levelset Code with #core: 20 - The runtime:1.58e-05
Hdagg_SpTRSV: /work_dir/aggregation/src/hdagg/hdagg.cpp:2085: void HDAGG::partialSparsification(int, int, const int*, const int*, std::vector<int>&, std::vector<int>&, bool): Assertion `col <= n' failed.
Aborted

The "LL Serial" and "LL Levelset" case work fine because they don't call partialSparsification. The "LL Tree + Levelset" and later cases fail at partialSparsification.

With release build it leads to memory errors, likely due to exceeding array bounds:

METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:1.24e-05
Running LL Levelset Code with #core: 20 - The runtime:1.46e-05
malloc_consolidate(): invalid chunk size
Aborted

The above are using the built-in random matrix generation. Using an external matrix like G2_circuit give a different error, but still memory-related.

Reading matrix /work_dir/matrix_data/G2_circuit/G2_circuit.mtx
METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:0.0009317
Running LL Levelset Code with #core: 2 - The runtime:0.0016047
Hdagg_SpTRSV: malloc.c:4118: _int_malloc: Assertion `chunk_main_arena (fwd)' failed.
Aborted

Steps to reproduce

git clone https://github.com/sympiler/aggregation
cd aggregation

# debug mode
cmake -DCMAKE_BUILD_TYPE=Debug -S . -B build_debug
cmake --build build_debug
./build_debug/example/Hdagg_SpTRSV  # generates random matrice, same error when reading a mtx file

# release mode
cmake -DCMAKE_BUILD_TYPE=Release -S . -B build_release
cmake --build build_release -j 4
./build_release/example/Hdagg_SpTRSV
# read external matrix file and use less threads than default
./build_release/example/Hdagg_SpTRSV /work_dir/matrix_data/G2_circuit/G2_circuit.mtx 2

The system environment can be reproduced using this trivial Dockerfile that installs GCC 11.3 and CMake 3.22:

FROM ubuntu:22.04

RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    git wget vim \
    gcc g++ gfortran \
    libnuma-dev \
    libhwloc-dev \
    libmetis-dev \
    libomp-dev \
    make cmake \
    && rm -rf /var/lib/apt/lists/*
cheshmi commented 1 year ago

Thanks, can you please send a PR?

learning-chip commented 1 year ago

I haven't diagnosed the root cause of error, but simply skipped the cases that calls partialSparsification in order to compile that example program. For example SpTrSv_LL_HDAGG doesn't call sparsification so still runs, but SpTrSv_LL_Tree_HDAGG breaks. However the inspection time of HDAGG is many times slower than Tree_HDAGG, presumably due to lack of sparsification.

learning-chip commented 1 year ago

One clue: The same SpTRSV_runtime.cpp example from the older repo https://github.com/BehroozZare/HDagg-benchmark (as of commit 948b38f) runs fine without array-bound errors. So it must be due to some updates in between.

cheshmi commented 1 year ago

Got it, thanks. I think HDagg repo is more updated than this for the HDagg algorithm.

learning-chip commented 1 year ago

Hi @cheshmi I found the problem with this repo --

The find_package(OpenMP) in the top-level cmake file is disabled:

https://github.com/sympiler/aggregation/blob/da104fa343672469f2000efa4285dccb32107370/CMakeLists.txt#L59-L72

This is probably fine when compiled as a subproject of Sympiler (where OpenMP is found one level higher); but when compiling this aggregation repo as a standalone project, the ENABLE_OPENMP macro and OpenMP flags won't get defined. Then, inside the HDAGG::partialSparsification function:

https://github.com/sympiler/aggregation/blob/da104fa343672469f2000efa4285dccb32107370/src/hdagg/hdagg.cpp#L2036-L2042

Both bins and tid will be set to 1, causing a bug when the user wants parallel partitioning.

Uncommenting find_package(OpenMP) fixes the bug. Or maybe using the HAS_PARENT condition to optionally include dependencies.

One the other hand, in the older HDagg-benchmark repo, there is no ENABLE_OPENMP switch, and bins is always set to omp_get_num_threads(), so no such bug.

learning-chip commented 1 year ago

More unit tests for the HDagg procedure (sparsification, grouping, partitioning, ungrouping...) might help prevent future bugs? In the current Catch_tests there seems to be only one test for the simple levelset sptrsv, but not more advanced partitionings like LBC or HDagg. And the test is disabled in cmake file.

learning-chip commented 1 year ago

The current GitHub workflow runs lbc_demo and sptrsv_demo, but not Hdagg_SpTRSV (the executable for SpTRSV_runtime.cpp).

https://github.com/sympiler/aggregation/blob/da104fa343672469f2000efa4285dccb32107370/.github/workflows/cmakeUbuntu.yml#L37-L45

If you add lines like

cmake --build ${{github.workspace}}/build --target Hdagg_SpTRSV
${{github.workspace}}/build/example/Hdagg_SpTRSV`, 

you should see the error message in GitHub actions.

learning-chip commented 1 year ago

OK I saw that find_package(OpenMP) was disabled by the commit https://github.com/sympiler/aggregation/commit/5b8830961acdf173f48f69a080ceb64bd3f63278 , presumably to ensure compatibility with the parent Sympiler directory.

May using

if(LBC_IS_TOPLEVEL)
    find_package(OpenMP)
endif()

to avoid duplication. Although I think duplicately calling find_package(OpenMP) is also fine...