owensgroup / merge-spmm

Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
Apache License 2.0
71 stars 14 forks source link

Questions about merge-path algo! #11

Open dwwcqu opened 1 year ago

dwwcqu commented 1 year ago

When I ported the merge-spmm from cuda-8.0 to cuda11+ and successful compiled, I got the memory fatal access errors when running gspmm --mode=mergepath XXX.mtx. The followings are compute-sanitizer tool's partial outputs:

========= Invalid __shared__ read of size 4 bytes
=========     at 0x12a0 in /merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:50:void mgpu::BinarySearchIt<(mgpu::MgpuBounds)1, int, const int *, int, mgpu::less<int>>(T3, int &, int &, T4, int, T5)
=========     by thread (0,0,0) in block (2,0,0)
=========     Address 0xf58 is out of bounds
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:85:int mgpu::BinarySearch<(mgpu::MgpuBounds)1, int, const int *, mgpu::less<int>>(T3, int, T2, T4) [0x1280]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:92:int mgpu::DeviceExpandCsrRows<(int)128, (int)1>(int, int, const int *, int, int, int *, int *) [0xf80]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:166:mgpu::SegReduceTerms mgpu::DeviceSegReducePrepareSpmm<(int)128, (int)1>(const int *, int *, int, int, int, int, bool, int *, int *) [0xf80]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/kernels/spmvcsr.cuh:203:void mgpu::KernelSpmmCsr<(int)32, (int)128, (bool)0, (bool)1, float *, int *, int *, const int *, float *, float *, float, mgpu::multiplies<float>, mgpu::plus<float>>(T5, T6, int, T7, T8, T9, T7, T10, T11 *, T11, T12, T13, int) [0xf20]

The source codes I have modified includes cusparse API, warp shuffle functions from __shfl* to __shfl_sync*, __any to __any_sync.

If you have any suggestions, thanks for helps!

dwwcqu commented 1 year ago

The split row algo and cusparse API tests work OK and passed the datasets.