iHeartGraph / iSpan

Parallel and distributed computation for the strongly connected component (SCC).
GNU General Public License v3.0
14 stars 10 forks source link

Running with `thread=1` is faster than running with multiple threads #2

Closed txfs19260817 closed 4 years ago

txfs19260817 commented 4 years ago

Hi, thanks for your great work. I managed to use my laptop to test the program on my graph dataset, which contains more than 10 million nodes and over 90% nodes are included in a single huge SCC. Surprisingly, I found that if I decrease the thread parameter from the default 56 to a lower value even only 1 and remain other parameters unchanged, the average time consumption will tend to be shorter.

#threads: time consumption
56: 19.3s
16: 5.5s
8: 3.0s
4: 2.0s
2: 1.9s
1: 2.1s

I was wondering if you could explain this phenomenon? Thanks! My env:

alpha = 30, beta = 200, gamma = 10, theta = 0.1, run_times = 10
MacBook Pro (15-inch, 2018)
Processor 2.6 GHz Intel Core i7
Memory 16 GB 2400 MHz DDR4
yuedeji commented 4 years ago

Hi, thanks for your interest in our work!

Based on the information you provided, we cannot tell what is going on. But, I suggest we follow the below steps to figure out what happened.

  1. Double-check the physical cores in your machine. So, we tested 56 threads because our server actually has that many physical cores although with hyper-threading. One sure thing is, your machine does not have that many cores, e.g., my MacBook pro only has 4 cores. If you assign more threads than the physical cores, there will be context switching which will hurt the performance. I think this is correlated to your observation of performance dropping from 4 threads to 56 threads. So, if your machine has 4 cores, I think testing the thread count from 1 to 4 makes sense.

  2. The memory is another issue. Basically, the graph with CSR representation will occupy 8(|V| + |E|) bytes. Further, the intermediate data structure will occupy several magnitudes of 4|V| bytes. So, I would suggest you to monitor the available memory (should be much smaller than 16 GB) before you test the graph and its changes during runtime.

  3. Could you test other graphs and see if the same observation holds? We have tested many standard real-world graphs collected from the KONECT and SNAP project, which you can find from our paper. I would suggest you test from smaller graphs (e.g., less than or around 1 million nodes).

  4. There are various flags in the code, such as debug, verbose, you can play with them and see what's going on inside the code during runtime.

You are welcome to post more! Good luck!

Best, Yuede

txfs19260817 commented 4 years ago

Hi, thank you for your advices. I have further tested on other graph datasets and a modified version of my graph with my 6-core laptop. Results are shown below.

#Threads LiveJournal Baidu DBPedia
1 274.3ms 149.9ms 633.7ms
2 209.3ms 96.6ms 422.1ms
4 162.8ms 65.4ms 306.1ms
8 144.3ms 52.7ms 236.5ms
12 143.0ms 53.2ms 227.0ms
16 151.3ms 59.2ms 245.0ms

First, from the results obtained from LiveJournal, Baidu and DBPedia graphs, we can find that the program running with a certain number of threads can reach the fastest performance.

#Threads Modified my-graph
1 47.3s
2 26.7s
4 20.0s
8 11.6s
12 11.9s
16 15.1s

Next, I modified my graph where the number of vertices of the largest SCC reduced from over 93% of total to 75%. At this time, the multi-threaded program outperforms the single-threaded one.

In addition, I found that the average time becomes longer and the WCC time occupies the most of total time consumption instead of the latency of FW-BW on the largest SCC. Maybe because the number of relative large SCCs are increased.

To sum up, I guess that executing with less threads can be faster if the graph has the property that the most of vertices are included by the largest SCC.

txfs19260817 commented 4 years ago
Screen Shot 2020-08-13 at 5 50 08 PM

Here is a chart to illustrate the relationship between size of SCC and the number of SCC. Version 1 and 2 represent before and after modifying respectively. (The largest SCC of each version has been removed for prettifying the chart.)

yuedeji commented 4 years ago

I think now you have reasonable results. Since your laptop has 6-core, I think testing thread count from 1 to 6 makes sense. As for your chart, you can keep the largest SCC by using the log scale for both x- and y-axis.

Do you have any more questions? Yuede

txfs19260817 commented 4 years ago

No, I don't have any, thank you for your assistance!