Open bkamins opened 3 years ago
This is not a bug but performance issue, but a bug
label was automatically added.
I think this might be, because diameter
works for the general case where an additional distance matrix is used. Internally it then uses dijkstra_shortest_paths
instead of gdistances
. Maybe we should have a special case for when no distance matrix is used.
In general using gdistances
is also not optimal I think. igraph is much faster on this graph in calculation of diameter
.
I think that in general either e.g. Floyd-Warshall or Johnson's algorithm should be used (depending on the sparsity of the graph), but unfortunately currently I have too much work to get DataFrames.jl to 1.0 release to dig into LightGraphs.jl sources - sorry for this. (or maybe there are some special algorithms that are custom made for diameter - I have not checked)
I am not sure if Floyd-Warshall is much more performant unless one finds a way to avoid allocating the temporary |V| x |V|
distance matrix. Best would probably to just do some benchmark with different algorithms & graphs, if someone got time for that.
There might also be the possiblity for a parallel diameter
algorithm
All-source bfs should be faster than FW for unweighted graphs. For weighted I think dijkstra will still beat FW but tests should confirm.
And we should definitely specialize distance metrics for unweighted.
igraph does BFS. F-W allocates but if graph is dense this should not be a problem as then |V| cannot be large anyway.
Would something like https://who.rocq.inria.fr/Laurent.Viennot/road/papers/ifub.pdf be overkill for our purposes?
Also, when you talk about distance metrics to specialize, what do you have in mind besides the diameter?
Also, when you talk about distance metrics to specialize, what do you have in mind besides the diameter?
Sorry for the delayed response. In general, if you have unweighted graphs, where we're currently using Dijkstra, we can simply substitute BFS. There's no need to track weights in an unweighted graph, and BFS is more performant specifically because we don't need to do these comparisons at each traversal.
I have a
gh
graph on 37700 vertices and 289003 edges.Calculation of its diameter like this:
takes around 250 seconds (probably this is not an optimal algorithm for the task but at least it works).
However, when I run
diameter(gh)
the process takes so long that I did not wait till it finished.