Open victorsongyw opened 3 years ago
@margaretd11 IO idea: read inputs from a file first line: N, M second line: start index of edges from nth node (N+1 numbers) third line: destination node of edges (M numbers) fourth line: weights of edges (M numbers)
Nodes should range from 0 to N-1 and the source node is node 0 Edge weights should be non-negative and should not overflow
See dijkstra/tests/simple for a dead simple example of two nodes.
Note: this is called the Compressed Sparse Row (CSR) Format
Found a paper which uses tests cases for BellmanFord: https://www.cs.cmu.edu/~jshun/ligra.pdf It mentions graphs such as 3D-grid, Random-local, rMat, and real-world graphs from Twitter and Yahoo. The paper uses unit weights for the real-world graphs. We might want to find some real-world weighted graphs.
Also, given the large data size, I have concerns with getting out of memory issues. We will see.
No worries, standard way to do it is writing a .h file using Python, and linking the c header file with main.cpp will make the CSR arrays available to your functions just as if they are already declared. No I/O needed. I have a script and will modify it for our purposes.
currently using barabasi-albert preferential attachment model to generate a graph following the power law. Todo: R-MAT real world graphs (could be big; I used one before and I could get one if we decide on one)
TODO: find at least one real world graph and compare our results with results in a paper
Average degree = 12
Us: barabasi-albert model with n nodes, m = 12
Reasonable n ~= 1e5
Us: gnm_random_graph with n nodes, m = 12n edges
n = 4,308,451, m = 68,993,773
n = 1,765,311, m = 10,564,104
For us, we could use
same category as LiveJournal
(Social Network)
same category as Patents
(Citation Network)
input_graph.h
represent each edge twice, so the NUM_EDGES in the file is twice the actual edge count, if the original graph is undirected.
Results on a comprehensive list of small graphs are in this sheet
Next step is to evaluate the best performing ones with larger graphs, to really distinguish their behaviors. Currently the best performing ones have runtime < 1s so may not be very accurate.
Okay so main takeaway of our results: GPU optimizations are a waste of time lol; sequential Dijkstra's is the KING 🥇
Find ways to generate SSSP problems of different sizes and varying node degrees. Verify correct answers.