Open aleaverfay opened 2 years ago
Wow, thanks for your reminder! I don't know the paper All-Pairs-Shortest-Paths for Large Graphs on the GPU when doing this assignment. 😊
I agree with you that it might need some synchronization barriers after modifying the shared memory like the first pass. However, I'm not working on CUDA programming for a long time. If I have time, I'd love to reproduce the issue, and thank you for the insight.
I have been reading the Katz and Kider 2008 paper and believe you are implementing the algorithm that they describe.
They only talk about synchronization for the first pass of their algorithm, but I believe there also needs to be synchronization between threads in the second pass, too. (If you don't think so, could you explain to me why you wouldn't?)
This would mean you need some
__syncthread
calls beneath lines 60, 66 in apsp_cuda.cu (and in similar positions in the other two implementations). I believe this may explain the inaccurate results you were getting in your report.pdf.