networkx / nx-parallel

A networkx backend that uses joblib to run graph algorithms in parallel.
BSD 3-Clause "New" or "Revised" License
34 stars 21 forks source link

ENH : improved `all_pairs_bellman_ford_path` #49

Closed Schefflera-Arboricola closed 8 months ago

Schefflera-Arboricola commented 9 months ago

build upon: https://github.com/networkx/nx-parallel/pull/14

Improvements

1. Yielding instead of returning a generator

The updated all_pairs_bellman_ford_path algorithm yields and the old(current) version returns a generator object. The difference is subtle but there is a difference. And, the speedups are also pretty good(for some cases). And, this way it is more aligned with networkx's implementation.

Speedup Heatmap(yielding Vs returning a generator object)

Screenshot 2024-02-22 at 11 56 07 PM

* The title says it compares nx-parallel and networkx implementations but it actually compares the old implementation and new implementation, I just forgot to change the title.


2. Added Chunking

I think it's good to have chunking whenever possible, as it increases user control. The chunking version of the algorithm gives better speed-ups(compared to the no_chunking version). It also gives better overall speed-ups(ref. the updated heatmap in PR). I've also added get_chunks which lets the user adjust the nodes inside the chunks.

Speedup Heatmap(default chunking Vs no chunking)

Screenshot 2024-02-23 at 6 22 35 PM

Timing script used

import time
import random
import types
import networkx as nx
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import nx_parallel as nxp

heatmapDF = pd.DataFrame()
number_of_nodes_list = [10, 50, 100, 300, 500]
pList = [1, 0.8, 0.6, 0.4, 0.2]
currFun = nx.all_pairs_bellman_ford_path
for p in pList:
    for num in number_of_nodes_list:
        # create original and parallel graphs
        G = nx.fast_gnp_random_graph(num, p, seed=42, directed=False)

        # for weighted graphs
        random.seed(42)
        for u, v in G.edges():
            G[u][v]["weight"] = random.random()

        H = nxp.ParallelGraph(G)

        # time both versions and update heatmapDF
        t1 = time.time()
        c = nxp.all_pairs_bellman_ford_path_new(H)
        # c = nxp.all_pairs_bellman_ford_path_chunk(H) # when timing chunking VS no-chunking
        if isinstance(c, types.GeneratorType):
            d = dict(c)
        t2 = time.time()
        newTime = t2 - t1
        t1 = time.time()
        c = nxp.all_pairs_bellman_ford_path_old(H)
        # c = nxp.all_pairs_bellman_ford_path_no_chunk(H) # when timing chunking VS no-chunking
        if isinstance(c, types.GeneratorType):
            d = dict(c)
        t2 = time.time()
        oldTime = t2 - t1
        timesFaster = oldTime / newTime
        heatmapDF.at[num, p] = timesFaster
        print("Finished " + str(currFun))

plt.figure(figsize=(20, 4))
hm = sns.heatmap(data=heatmapDF.T, annot=True, cmap="Greens", cbar=True)
hm.set_yticklabels(pList)
hm.set_xticklabels(number_of_nodes_list)
plt.xticks(rotation=45)
plt.yticks(rotation=20)
plt.title("Small Scale Demo: Times Speedups of " + currFun.__name__ + " compared to networkx")
plt.xlabel("Number of Vertices")
plt.ylabel("Edge Probability")
print(currFun.__name__)
plt.tight_layout()
plt.savefig("timing/" + "heatmap_" + currFun.__name__ + "_timing.png")