When I ported the merge-spmm from cuda-8.0 to cuda11+ and successful compiled, I got the memory fatal access errors when running gspmm --mode=mergepath XXX.mtx.
The followings are compute-sanitizer tool's partial outputs:
========= Invalid __shared__ read of size 4 bytes
========= at 0x12a0 in /merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:50:void mgpu::BinarySearchIt<(mgpu::MgpuBounds)1, int, const int *, int, mgpu::less<int>>(T3, int &, int &, T4, int, T5)
========= by thread (0,0,0) in block (2,0,0)
========= Address 0xf58 is out of bounds
========= Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:85:int mgpu::BinarySearch<(mgpu::MgpuBounds)1, int, const int *, mgpu::less<int>>(T3, int, T2, T4) [0x1280]
========= Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:92:int mgpu::DeviceExpandCsrRows<(int)128, (int)1>(int, int, const int *, int, int, int *, int *) [0xf80]
========= Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:166:mgpu::SegReduceTerms mgpu::DeviceSegReducePrepareSpmm<(int)128, (int)1>(const int *, int *, int, int, int, int, bool, int *, int *) [0xf80]
========= Device Frame:/merge-spmm/ext/moderngpu/include/kernels/spmvcsr.cuh:203:void mgpu::KernelSpmmCsr<(int)32, (int)128, (bool)0, (bool)1, float *, int *, int *, const int *, float *, float *, float, mgpu::multiplies<float>, mgpu::plus<float>>(T5, T6, int, T7, T8, T9, T7, T10, T11 *, T11, T12, T13, int) [0xf20]
The source codes I have modified includes cusparse API, warp shuffle functions from __shfl* to __shfl_sync*, __any to __any_sync.
When I ported the merge-spmm from cuda-8.0 to cuda11+ and successful compiled, I got the memory fatal access errors when running
gspmm --mode=mergepath XXX.mtx
. The followings arecompute-sanitizer
tool's partial outputs:The source codes I have modified includes cusparse API, warp shuffle functions from
__shfl*
to__shfl_sync*
,__any
to__any_sync
.If you have any suggestions, thanks for helps!