Open yakra opened 5 years ago
Multi-threaded subgraph writing is implemented, but there's room to improve parallelism.
There's a lot of "dead space" in the way processing is split up between each graph category. Some cores are idle while we wait for the last graphs in each category to finish, before we prepare for the graphs in the next category. With only 2 multisystem graph sets, 2 cores are idle the entire 3.8s. And EUR-freeways probably finishes quite a bit before USA-national.
A better way is to first prep all categories' graphs, and load up the entire graph_vector
first.
Then we can plow through the entire thing in one go, no gaps, no stalls.
Add graphs to graph_vector
from slowest to fastest. Fill the bathtub up with baseballs before filling it up with marbles, n'at.
Task | Time (s) | Sets | Seconds Per set |
---|---|---|---|
Writing continent graphs. | 17.4 | 6 | 2.90 |
Writing multisystem graphs. | 4.2 | 2 | 2.10 |
Writing system data graphs. | 4.6 | 4 | 1.15 |
Writing country graphs. | 10.8 | 14 | 0.77 |
Writing area data graphs. | 8.5 | 25 | 0.34 |
Writing multiregion graphs. | 1.1 | 4 | 0.27 |
Writing regional data graphs. | 10.9 | 342 | 0.03 |
^ Tackle https://github.com/TravelMapping/DataProcessing/issues/223 first. this may alter where the area graphs sit in the speed hierarchy.
Task | Time (s) | Sets | Seconds Per set |
---|---|---|---|
Creating continent graphs. | 17.3 | 6 | 2.88 |
Creating multisystem graphs. | 4.3 | 2 | 2.15 |
Creating system data graphs. | 4.5 | 4 | 1.1 |
Creating country graphs. | 10.6 | 14 | 0.76 |
Creating multiregion graphs. | 1.1 | 4 | 0.3 |
Creating area data graphs. | 1.2 | 26 | 0.05 |
Creating regional data graphs. | 11 | 329 | 0.03 |
Lab2 is able to take advantage of its greater number of cores, and outperforms lab1 in most multi-threaded tasks.
yakra@BiggaTomato:~/TravelMapping/devel/cpuscale/20200318$
sum=0
deltas=$( \
for task in ReadWpt NmpSearch NmpMerged ReadList \
ConcAug CompStats UserLog Subgraph; do
vals=$( \
for BOX in lab1 lab2; do
./taskscale.sh $BOX _insert $task | grep 'peak'
done | sed 's~.*is \(.*\)s at.*~\1~'
)
eq=$(echo $vals | tr ' ' '-')
echo -n "$eq="
echo "scale=3; $eq" | bc
done | cut -f2 -d'=' \
)
for d in $deltas; do sum=$sum+$d; done
echo "scale=3; $sum" | bc
All combined, lab2 is faster by 1.89s. Lab1 wins overall because it's faster in the single-threaded bits.
TBD: How much total time is taken up by each machine:
To find average time for a given single-threaded task on a given machine:
yakra@BiggaTomato:~/TravelMapping/devel/cpuscale/20200318/sulogs/cpp$
count=$(ls siteupdate_insert-lab2* | wc -l)
sum=0
for file in $(ls siteupdate_insert-lab2*); do
sum=$sum+$( \
delta=$( \
tac $file | grep '^\[[0-9]\+\.[0-9]\]' \
| grep -B 1 unique \
| sed 's~^\[\([0-9]\+\.[0-9]\).*~\1~'
)
echo "scale=1; $(echo $delta | tr ' ' '-')" | bc \
)
done
echo "scale=8; ($sum) / $count" | bc
250a2fa6e17d82e9dce74e9f79537af1b2b98aea
1.1s
b07f73b1a6531aad7ca8e0f9cc5ce81d2be0a45e
Done.
2dd3b28b54165ab2f1745ad2befed0c612c3b8b2
4d988b326b2c2069a71b43177c4c1721859ac0dd
3.3s
Need mutex, to:
• check for duplicate edges (reorder code in 1st ctor).
• add to adjacency lists
Just one should be good, but see below...
~1.3s
• 1 for every vertex = 22,686,480 B
• 1 for every quadtree node (faster) = 4,916,336 B
• 1 for every quadtree node (slower) = 1,526,120 B
• 1 for everything = 40 B. I can get away with this, right?
96be2a38300bf8ea63c23ddd17cf80441669fd9f
4879245fdfa727e57f90cd33b110a7ddd14e9bd6
Route::read_wpt
to take advantage of idle CPU time while waiting for I/O.9.7s
bc964258e733abafe6ff2550f16fa12788a51439