SSSP graph inputs and benchmarking

victorsongyw commented 3 years ago

Find ways to generate SSSP problems of different sizes and varying node degrees. Verify correct answers.

victorsongyw commented 3 years ago

@margaretd11 IO idea: read inputs from a file first line: N, M second line: start index of edges from nth node (N+1 numbers) third line: destination node of edges (M numbers) fourth line: weights of edges (M numbers)

Nodes should range from 0 to N-1 and the source node is node 0 Edge weights should be non-negative and should not overflow

See dijkstra/tests/simple for a dead simple example of two nodes.

Note: this is called the Compressed Sparse Row (CSR) Format

victorsongyw commented 3 years ago

Found a paper which uses tests cases for BellmanFord: https://www.cs.cmu.edu/~jshun/ligra.pdf It mentions graphs such as 3D-grid, Random-local, rMat, and real-world graphs from Twitter and Yahoo. The paper uses unit weights for the real-world graphs. We might want to find some real-world weighted graphs.

Also, given the large data size, I have concerns with getting out of memory issues. We will see.

margaretd11 commented 3 years ago

No worries, standard way to do it is writing a .h file using Python, and linking the c header file with main.cpp will make the CSR arrays available to your functions just as if they are already declared. No I/O needed. I have a script and will modify it for our purposes.

margaretd11 commented 3 years ago

currently using barabasi-albert preferential attachment model to generate a graph following the power law. Todo: R-MAT real world graphs (could be big; I used one before and I could get one if we decide on one)

victorsongyw commented 3 years ago

TODO: find at least one real world graph and compare our results with results in a paper

margaretd11 commented 3 years ago

Graphs used in the paper:

Generated graphs

Average degree = 12

RMAT (irregular), n = 4e6

Us: barabasi-albert model with n nodes, m = 12

Reasonable n ~= 1e5

Random (regular), n = 4e6 Random is a uniformly distributed graph instance created by randomly connecting m pairs of nodes out of a total n nodes

Us: gnm_random_graph with n nodes, m = 12n edges

Real-world graphs

LiveJournal

n = 4,308,451, m = 68,993,773

Patents

n = 1,765,311, m = 10,564,104

For us, we could use

ego-Facebook: 4,039 & 88,234
ego-Twitter: 81,306 & 1,768,149

same category as LiveJournal (Social Network)

cit-HepPh: 34,546 & 421,578

same category as Patents (Citation Network)

margaretd11 commented 3 years ago

Note:

input_graph.h represent each edge twice, so the NUM_EDGES in the file is twice the actual edge count, if the original graph is undirected.

margaretd11 commented 3 years ago

Results on a comprehensive list of small graphs are in this sheet

Next step is to evaluate the best performing ones with larger graphs, to really distinguish their behaviors. Currently the best performing ones have runtime < 1s so may not be very accurate.

victorsongyw commented 3 years ago

Okay so main takeaway of our results: GPU optimizations are a waste of time lol; sequential Dijkstra's is the KING 🥇

victorsongyw / 15418-project