Closed Toolicat closed 1 year ago
Hello @Toolicat and thanks for reaching out to us!
The betweenness centrality metric for a node n
is defined as the number of shortest paths between any two nodes that passes through n
. Our implementation does not currently support any weight configuration. How would you like such a weighted algorithm to be defined?
It doesn't seem straight forward to me to just configure a relationship weight. If we do that, then the accumulated costs for a shortest path between a
and b
could be x
. Let's say that this path passes through nodes n
and m
; how much would we increase the betweenness centrality (BC) of n
and m
? In the unweighted case, we would increase the BC by 1 for both (and any other node on the path). Would we now instead increase it by x
? Or by x / length(path)
? I'm not sure if that results in a good definition of the metric.
I am curious as to how you imagine defining a weighted BC, or if you have read papers or other academic definitions that you could share?
Regards Mats
Thanks for the quick response!
I'm not sure either, but the x / length(path)
option looks better.
This work is closest to a universal solution that will be useful not only to me https://arxiv.org/pdf/2006.08668.pdf
By the way I now react a bit to your statement
I assume that betweenness centrality with weights will increase performance.
I should note that Betweenness Centrality is a particularly expensive algorithm, running in O(n * m), meaning it does not scale well with increasing graph sizes. The GDS implementation offers a sampled variant to deal with this, but using sampling will not give exact results.
I'm not sure if you mean performance as in runtime, or as in quality of results. But I think that BC risks compromising both aspects, because it will likely either be slow or inaccurate, unless your graph is relatively small.
I know about aproximation I mean perfomance in quality of results
In my case i try to define community as node with connections through nodes that represent people's and then replace multiple connections between communities through peoples to 1 conection with weight. In output I must get most influential communities.
I deliberately skipped this detail in the explanation of the problem, since I think the weighted BC will be useful for many developers and there is a general weighted BC algorithm.
Hi @Toolicat - we've added your request to our backlog. We'll reach out if we need more information about your requirements, or to alert you when we've added support for weights in Betweenness Centrality ๐
Thanks Alicia and Mats I'm really appreciate that! I'm big fan of Neo4j, you making a great product ๐ฅ
Mats, hi again!
In that comment https://github.com/neo4j/graph-data-science/issues/171#issuecomment-1057784002 you say "I'm not sure if that results in a good definition of the metric." Did you mean that If node โnโ has weight โ30โ and node โmโ has weight 20 and they lie on a path with length โ2โ then node โnโ gives โ15โ score to โmโ and a โmโ gives only โ10โ score to โnโ ?.
It can ruin the results when local bridges have a lot of direct relations and accumulate weight bigger than global bridge.
Maybe a good decision will be to calculate BC as unweighted and after that multiply the score of unweighted BC on weight counted by the following expression "x / length(path)"
What's in practice ?
We have node โnโ with weight 30 and 10 shortest paths goin through โnโ and node โmโ with weight 15 and 20 shortest paths going through โmโ and total amount of 100 shortest paths over that network. In results we get weight equal to โ3โ for both nodes but for different reasons. In case if weight represents frequency than node โnโ will be influential because of high frequency of signal transmission and node โmโ will be influential because of better position in transmission of signal
If I miss something in logic or have lack knowledge about low level computation of algorithm then I be grateful for suggestions
Regards Jack
Hi @Toolicat
I was thinking that the weights were on the relationships. For example in this graph:
(a)-[{w: 1}]->(b),
(a)-[{w: 3}]->(c),
(b)-[{w: 1}]->(c),
(b)-[{w: 2}]->(d),
(c)-[{w: 2}]->(e),
(d)-[{w: 1}]->(e)
Then the unweighted shortest paths would be
a->b->d
a->c->e
b->d->e / b->c->e
resulting in BC scores of
a: 0
b: 1
c: 1.5
d: 0.5
e: 0
but if we use the relationship weights, we would get weighted shortest paths of
a->b->d (cost 3)
a->b->d->e (cost 4)
b->d->e / b->c->e (both cost 3)
and one interpretation would be to just count these for the BC score as for the unweighted case, resulting in
a: 0
b: 2
c: 0.5
d: 1.5
e: 0
And I guess from this point of view the metric remains defined exactly the same, so definitely well defined.
I didn't think this thoroughly about the problem when I wrote my last reply. But what I was thinking was that the accumulated cost of the shortest path would be taken into account for the BC score. This comes out of the observation that unweighted shortest paths contribute with a mass of (length - 1) / p
where p
is the number of shortest paths between the same endpoints. That is, the three unweighted paths I listed above all contribute with a mass of 1 to the BC computation, and indeed the sum of BCs is three (1 + 1.5 + 0.5).
As it turns out, that fact stays the same for the weighted case as well. I thought that perhaps the actual cost of the path, which is now generally not equal to the length, would need to be taken into account. If that is not the idea, then I think my comment about poor definition is not so helpful.
Anyway, as I've learned from my coworkers, the problem for the weighted BC is not so much the definition but the fact that computing weighted shortest paths between all pairs of nodes is very computationally expensive. So the resulting algorithm will not scale very well. But it still is in the cards for us to implement in the future, so stay tuned!
Still not sure that I understand the algorithm correctly. Need time to digest that info. Anyway thanks for the explanation, You and Alicia are my personal Rock Stars :-)
I will try to keep in touch, but I am from Ukraine and may disappear from communication
Hi all,
I am new to Neo4j and would like to join Jack [Toolicat] in thanking Alicia, Mats, and the rest of the Neo4j team for the great product they are creating.
Of course, I found this thread because I had the same request in mind for a different use case. So, thanks to Jack for expressing the question in broad terms.
Please permit me to chime in. I am still a rookie, so please correct me if I misstate any fact. Concerning Mats last comment and I quote โโฆ the problem for the weighted BC is not so much the definition but the fact that computing weighted shortest paths between all pairs of nodes is very computationally expensive. So the resulting algorithm will not scale very wellโ.
Perhaps the algorithm already exists and is just a few steps from implementing it as a Betweenness Centrality with weighted relationships. All that needs to be done (in my understanding) is to execute the Dijkstra Single-Source Shortest Path algorithm, which already exists for weighted relationships. However, to achieve the Betweenness Centrality on these weighted paths, we count the number of times each node appears on the single-source shortest paths for all the other nodes. These counts are essentially a representation of the Betweenness Centrality on the weighted path.
The Betweenness Centrality could even be imagined as a histogram of the number of times of occurrence of each node on the Dijkstra Single-Source Shortest Path algorithm of all the other nodes.
In other words, what distinguishes the Betweenness Centrality for the weighted case from the unweighted case is that the single-source shortest path computation is based on counting the least number of hops between each pair of nodes is counted for the unweighted case;
whereas for the weighted case, the shortest path is determined by summing the weights on the path between each pair of nodes.
I hope I was clear enough. Please correct me if I am wrong in my understanding.
Thank you, Paul
Hi, everyone I am also interested in this feature, when will it be available? Regards, Ahmadshoh.
The feature exists in other computing languages. I was able to execute it in R. So, I guess it should be achievable in Neo4j as well. Many thanks, Paul
Hi, Mats
You said
Anyway, as I've learned from my coworkers, the problem for the weighted BC is not so much the definition but the fact that computing weighted shortest paths between all pairs of nodes is very computationally expensive. So the resulting algorithm will not scale very well. But it still is in the cards for us to implement in the future, so stay tuned!
But If I collapse paths and count parallel relationships between a pair of nodes as weight before run BC that decrease computational intensity or that make no sense and cypher projection can do the same as collapse paths ?
Hi all :wave:
We have added support for relationship weights in Betwenness Centrality.
You can specify a relationshipWeightProperty
, which is expected to exist already in your projected graph. I.e., the algorithm will not collapse paths or aggregate parallel relationships for you. For that, specify an aggregation when projecting the graph, or use Collapse Path (which also received an upgrade with support for more complex "path templates")
The weighted BC implementation uses Dijkstra to compute the shortest paths, which means it is defined for positive relationship weights.
Weighted BC will be in the upcoming 2.2 release. You can have a look at the preview documentation and try it out. Thanks for your input, let us know what you think!
Hi, jjaderberg!
Thank you so much
As for this
The weighted BC implementation uses Dijkstra to compute the shortest paths, which means it is defined for positive relationship weights.
Does it mean that higher the value in weight property, the faster path? Or it means that path with higher value more expensive?
Does it mean that higher the value in weight property, the faster path? Or it means that path with higher value more expensive?
Higher weight means path is more expensive (longer).
p1=(a1)-[{weight: 1.0}]->(b)
p2=(a2)-[{weight: 0.5}]->(b)
Traversing the relationship with a1->b
, weight 1.0, is more expensive than a2->b
, with weight 0.5.
Path p2 is shorter than p1 because the weight on the path is lower.
Thank you, the algorithm gives accurate results, so far I have only tested it on a small graph and the speed was good.
I think it should be added to the documentation
Higher weight means path is more expensive (longer).
I try to make parallel relationships between nodes counted as weights in betweenness centrality algorithm.
It would be great if it was possible to add weights both in the cypher projection and in the native projections. It is necessary that the weight can be set by the number of parallel relations betwen two conected nodes or by degre of node or any other integer or float
I considered gds influence maximization algorithms as alternative in solving my task but still I assume that betweenness centrality with weights will increase performance.