__syncthreads needed in asps_cuda.cu pass 2?

leVirve / All-Pair-Shortest-Path-CUDA

Accelerate algorithm block all-pair shortest path through CUDA

MIT License

6 stars 4 forks source link

I have been reading the Katz and Kider 2008 paper and believe you are implementing the algorithm that they describe.

They only talk about synchronization for the first pass of their algorithm, but I believe there also needs to be synchronization between threads in the second pass, too. (If you don't think so, could you explain to me why you wouldn't?)

This would mean you need some __syncthread calls beneath lines 60, 66 in apsp_cuda.cu (and in similar positions in the other two implementations). I believe this may explain the inaccurate results you were getting in your report.pdf.

leVirve / All-Pair-Shortest-Path-CUDA

__syncthreads needed in asps_cuda.cu pass 2? #2