weifengliu-ssslab / Benchmark_SpMV_using_CSR5

CSR5-based SpMV on CPUs, GPUs and Xeon Phi
MIT License
95 stars 32 forks source link

Result not pass #7

Open phreer opened 4 years ago

phreer commented 4 years ago

Hi, thanks for your awesome work and I am very interested in the CSR5 format.

But when I tested the code (AVX2 version) in this repository, the program complained that the result is incorrect. I tested the code with webbase-1M.mtx and some other matrices downloaded from SuiteSparse Matrix Collection. This is the output:

------------------------------------------------------
PRECISION = 64-bit Double Precision
------------------------------------------------------
--------------../webbase-1M/webbase-1M.mtx--------------
 ( 1000005, 1000005 ) nnz = 3105536
cpu sequential time = 10.758 ms. Bandwidth = 6.8889 GB/s. GFlops = 0.577344 GFlops.

omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 4.077000 ms
CSR->CSR5 tile_ptr time = 29.580000 ms
CSR->CSR5 tile_desc time = 9.116000 ms
CSR->CSR5 transpose time = 5.595000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.144000 ms
CSR->CSR5 tile_ptr time = 1.335000 ms
CSR->CSR5 tile_desc time = 4.970000 ms
CSR->CSR5 transpose time = 3.740000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.126000 ms
CSR->CSR5 tile_ptr time = 1.243000 ms
CSR->CSR5 tile_desc time = 4.799000 ms
CSR->CSR5 transpose time = 3.132000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.122000 ms
CSR->CSR5 tile_ptr time = 1.229000 ms
CSR->CSR5 tile_desc time = 4.324000 ms
CSR->CSR5 transpose time = 3.812000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.109000 ms
CSR->CSR5 tile_ptr time = 1.020000 ms
CSR->CSR5 tile_desc time = 4.126000 ms
CSR->CSR5 transpose time = 2.930000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.107000 ms
CSR->CSR5 tile_ptr time = 0.926000 ms
CSR->CSR5 tile_desc time = 3.673000 ms
CSR->CSR5 transpose time = 3.280000 ms
CSR->CSR5 time = 8.053 ms.
CSR5-based SpMV time = 1.08296 ms. Bandwidth = 68.4337 GB/s. GFlops = 5.73529 GFlops.
Check... NO PASS! #Error = 3701 out of 1000005 entries.
------------------------------------------------------

The platform where I ran the code is a CentOS7 server with Intel Xeon E5-2680 v4, and the program was built icc (ICC) 19.1.2.254 20200623 (shipped with Intel System Studio 2020 Update 2. I am wondering that whether it is caused by the compiler.

Hope to get some hints.

weifengliu-ssslab commented 4 years ago

Thanks for the message! We'll have a look at this!

Weifeng Liu weifeng.liu@cup.edu.cn

签名由 网易邮箱大师 定制

On 10/20/2020 16:03,Phreenotifications@github.com wrote:

Hi, thanks for your awesome work and I am very interested in the CSR5 format. But when I tested the code (AVX2 version) in this repository, the program complained that the result is incorrect. I tested the code with webbase-1M.mtx and some other matrices downloaded from SuiteSparse Matrix Collection. This is the output:

PRECISION = 64-bit Double Precision

--------------../webbase-1M/webbase-1M.mtx-------------- ( 1000005, 1000005 ) nnz = 3105536 cpu sequential time = 10.758 ms. Bandwidth = 6.8889 GB/s. GFlops = 0.577344 GFlops.

omega = 4, sigma = 16. #partition = 48524 CSR->CSR5 malloc time = 4.077000 ms CSR->CSR5 tile_ptr time = 29.580000 ms CSR->CSR5 tile_desc time = 9.116000 ms CSR->CSR5 transpose time = 5.595000 ms omega = 4, sigma = 16. #partition = 48524 CSR->CSR5 malloc time = 0.144000 ms CSR->CSR5 tile_ptr time = 1.335000 ms CSR->CSR5 tile_desc time = 4.970000 ms CSR->CSR5 transpose time = 3.740000 ms omega = 4, sigma = 16. #partition = 48524 CSR->CSR5 malloc time = 0.126000 ms CSR->CSR5 tile_ptr time = 1.243000 ms CSR->CSR5 tile_desc time = 4.799000 ms CSR->CSR5 transpose time = 3.132000 ms omega = 4, sigma = 16. #partition = 48524 CSR->CSR5 malloc time = 0.122000 ms CSR->CSR5 tile_ptr time = 1.229000 ms CSR->CSR5 tile_desc time = 4.324000 ms CSR->CSR5 transpose time = 3.812000 ms omega = 4, sigma = 16. #partition = 48524 CSR->CSR5 malloc time = 0.109000 ms CSR->CSR5 tile_ptr time = 1.020000 ms CSR->CSR5 tile_desc time = 4.126000 ms CSR->CSR5 transpose time = 2.930000 ms omega = 4, sigma = 16. #partition = 48524 CSR->CSR5 malloc time = 0.107000 ms CSR->CSR5 tile_ptr time = 0.926000 ms CSR->CSR5 tile_desc time = 3.673000 ms CSR->CSR5 transpose time = 3.280000 ms CSR->CSR5 time = 8.053 ms. CSR5-based SpMV time = 1.08296 ms. Bandwidth = 68.4337 GB/s. GFlops = 5.73529 GFlops. Check... NO PASS! #Error = 3701 out of 1000005 entries.

The platform where I ran the code is a CentOS7 server with Intel Xeon E5-2680 v4, and the program was built icc (ICC) 19.1.2.254 20200623 (shipped with Intel System Studio 2020 Update 2. I am wondering that whether it is caused by the compiler. Hope to get some hints. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

phreer commented 4 years ago

It seems that this error was indeed caused by the difference of icc, because I have tried the icc of 2018 version and the error does not exist any longer.