socnetv / app

Social Network Analysis and Visualization software application.
https://socnetv.org
Other
214 stars 27 forks source link

Mismatch on the number of shortest paths #123

Closed popsynoger closed 3 years ago

popsynoger commented 3 years ago

Hello,

I've been recently studying centrality measures by trying to understand them and coding a working routine. SocNetV is a reference to me, so I am using it to validate the results and I'm also using your code to help me learn. It's been really great and I'm really grateful for this, so I'd like to thank you very much for your work.

However, I've been struggling with the geodesics (number of shortest paths) matrix. This is weird since the distance matrix is ok and, codewise, it shouldn't be so tricky to get it right from this point. So I've ran some tests and I believe that there might be something wrong with SocNetV's algorithm - even though I'm not sure.

I've spotted a pattern on the differences between our results, but the graph was a bit complex (11 x 11, weighted with inverse weights). So I decided to run a smaller graph to analyze the differences between the two results. I've ran this adjacency matrix:

    1   2   3   4   5
1   0   1   2   3   3
2   11  0   189 1   2
3   11  189 0   289 1
4   11  40  289 0   1
5   56  11  22  89  0

And we obtained the same distances matrix, which was already expected:

    1   2   3   4   5
1   0   1   2   2   3
2   11  0   13  1   2
3   11  12  0   13  1
4   11  12  13  0   1
5   22  11  22  12  0

However, we still had differences on our geodesics (number of shortest paths) matrix. I've obtained:

    1   2   3   4   5
1   1   1   1   1   4
2   1   1   1   1   2
3   1   2   1   2   1
4   1   2   1   1   1
5   1   1   1   1   1

While SocNetV obtained:

        1   2   3   4   5
1   1   1   1   1   5
2   1   1   1   1   2
3   1   2   1   3   1
4   1   2   1   1   1
5   1   1   1   1   1

So the differences were on s(1,5) and s(3,4). I've tried to solve these by hand in order to understand what was going on. I can see exactly four shortest paths from node 1 to 5, namely:

1>5 -> d(1,5) = 3
1>2>5 -> d(1,2,5) = 1 + 2 =  3
1>2>4>5 -> d(1,2,4,5) = 1 + 1 + 1 = 3
1>3>5 -> d(1,3,5) = 2 + 1

Note that 1>4>5 (d(1,4) = 2, w(4,5) = 1) is a shortest path. However, it is only a shortest path if you go from node 1 to node 4 through node 2. Otherwise, if you go straight from node 1 to 4, the distance d(1,4) = 3 and d(1,5) = 3 + 1 = 4, which is not a shortest path. It may be seen from the fact that s(1,4) = 1, which refers to the path that goes through node 2.

The same happens for s(3,4):

3>1>2>4 -> d(3,1,2,4) = 11 + 1 +1 = 13
3>5>2>4 -> d(3,5,2,4) = 1 + 11 + 1 = 13

Once again, 3>2>4 is the shortest path between nodes 3 and 4, but only if you go through node 1 or 5 when you're travelling between nodes 3 and 2. If you go straight from node 3 to node 2, the distance 3>2>4 becomes much higher as the original distance between those two is 189.

This is the same pattern that I've spotted on the previous graph - the 11 x 11 with inverted weights. Whenever the path goes through more than one intermediate node, our results didn't match. Since the algorithm updates d(src,u) and keeps calculating shortest paths, is it possible that your code is counting both 1>4>5 and 1>2>4>5 (as well as both 3>2>4 and 3>1>2>4/3>5>2>4) as shortest paths when it shouldn't or am I missing something?

If this is true, maybe 1>4>5 (and 3>2>4) is being visited again after d(1,4) (and d(3,2)) value is updated. I think it maybe would be the case if the priority queue allowed duplicates. In this case, the same node could be called twice in a row (if, for example, it is duplicated with close priority values) and the number of shortest paths would be wrongly increased twice. If that's not the case, I apologize for taking your time in what might be just something that I am not currently spotting.

Regards,

oxy86 commented 3 years ago

Hello,

Thanks for the kind words, and thanks for reporting this! Your example and explanation makes sense. It's probably a bug in my code. I will investigate it, try to triage it and get back to you as soon as possible (perhaps by next week if I have enough spare time over the weekend).

Cheers, and stay safe.

On Fri, Apr 9, 2021 at 11:55 PM popsynoger @.***> wrote:

Hello,

I've been recently studying centrality measures by trying to understand them and coding a working routine. SocNetV is a reference to me, so I am using it to validate the results and I'm also using your code to help me learn. It's been really great and I'm really grateful for this, so I'd like to thank you very much for your work.

However, I've been struggling with the geodesics (number of shortest paths) matrix. This is weird since the distance matrix is ok and, codewise, it shouldn't be so tricky to get it right from this point. So I've ran some tests and I believe that there might be something wrong with SocNetV's algorithm - even though I'm not sure.

I've spotted a pattern on the differences between our results, but the graph was a bit complex (11 x 11, weighted with inverse weights). So I decided to run a smaller graph to analyze the differences between the two results. I've ran this adjacency matrix:

1 2 3 4 5 1 0 1 2 3 3 2 11 0 189 1 2 3 11 189 0 289 1 4 11 40 289 0 1 5 56 11 22 89 0

And we obtained the same distances matrix, which was already expected:

1 2 3 4 5 1 0 1 2 2 3 2 11 0 13 1 2 3 11 12 0 13 1 4 11 12 13 0 1 5 22 11 22 12 0

However, we still had differences on our geodesics (number of shortest paths) matrix. I've obtained:

1 2 3 4 5 1 1 1 1 1 4 2 1 1 1 1 2 3 1 2 1 2 1 4 1 2 1 1 1 5 1 1 1 1 1

While SocNetV obtained:

    1 2   3   4   5

1 1 1 1 1 5 2 1 1 1 1 2 3 1 2 1 3 1 4 1 2 1 1 1 5 1 1 1 1 1

So the differences were on s(1,5) and s(3,4). I've tried to solve these by hand in order to understand what was going on. I can see exactly four shortest paths from node 1 to 5, namely:

1>5 -> d(1,5) = 3 1>2>5 -> d(1,2,5) = 1 + 2 = 3 1>2>4>5 -> d(1,2,4,5) = 1 + 1 + 1 = 3 1>3>5 -> d(1,3,5) = 2 + 1

Note that 1>4>5 (d(1,4) = 2, w(4,5) = 1) is a shortest path. However, it is only a shortest path if you go from node 1 to node 4 through node 2. Otherwise, if you go straight from node 1 to 4, the distance d(1,4) = 3 and d(1,5) = 3 + 1 = 4, which is not a shortest path. It may be seen from the fact that s(1,4) = 1, which refers to the path that goes through node 2.

The same happens for s(3,4):

3>1>2>4 -> d(3,1,2,4) = 11 + 1 +1 = 13 3>5>2>4 -> d(3,5,2,4) = 1 + 11 + 1 = 13

Once again, 3>2>4 is the shortest path between nodes 3 and 4, but only if you go through node 1 or 5 when you're travelling between nodes 3 and 2. If you go straight from node 3 to node 2, the distance 3>2>4 becomes much higher as the original distance between those two is 189.

This is the same pattern that I've spotted on the previous graph - the 11 x 11 with inverted weights. Whenever the path goes through more than one intermediate node, our results didn't match. Since the algorithm updates d(src,u) and keeps calculating shortest paths, is it possible that your code is counting both 1>4>5 and 1>2>4>5 (as well as both 3>2>4 and 3>1>2>4/3>5>2>4) as shortest paths when it shouldn't or am I missing something?

If this is true, maybe 1>4>5 (and 3>2>4) is being visited again after d(1,4) (and d(3,2)) value is updated. I think it maybe would be the case if the priority queue allowed duplicates. In this case, the same node could be called twice in a row (if, for example, it is duplicated with close priority values) and the number of shortest paths would be wrongly increased twice. If that's not the case, I apologize for taking your time in what might be just something that I am not currently spotting.

Regards,

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/socnetv/app/issues/123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALUSB4ZHB7SGIM7PTCMDTTH5SSTANCNFSM42VVD76A .

-- Dimitris Kalamaras

popsynoger commented 3 years ago

Hello,

sorry for the late reply, I was away for a few days.

Ok, thanks. Let me know if there is something I can do to help you.

Cheers!

oxy86 commented 3 years ago

Hello, This has been fixed in the latest 3.0-rc version. Get it from our continuous pre-releases: https://github.com/socnetv/app/releases/tag/continuous Let me know if that works for you.

popsynoger commented 3 years ago

Hi, @oxy86!

Yes, the latest 3.0-rc version that I just downloaded from this link yields the correct number of shortest paths on this example.

Thank you!!