yakra / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
0 stars 0 forks source link

CongAugThread, L3 cache & CompStatsThread #133

Closed yakra closed 4 years ago

yakra commented 4 years ago

Would a concurrencies.log solution that's gentler on L3 cache avoid the slowdown in Computing stats we saw in 8a0b5fe9d7cc6664252dded823cb743fc5f91df1? Instead of storing the whole string, we could store a tuple of the 3 pointers we need to recreate it. Not counting container overhead, We're dealing with 14.2 MB rather than 59.7. In 24 B at a time, rather than >4x as many.

yakra commented 4 years ago
  • Not much time to save in ConcAugThread itself; we saved a very small amount when constructing no list elements whatsoever.

I was looking either not very closely, at too little data, incomplete data, or possibly even the wrong data. Initial results from lab1 with 1 and 2 threads are actually quite promising.

yakra commented 4 years ago

Writing to concurrencies.log

How much additional time would be wasted dereferencing our pointers and individually inserting them into the ofstream, all single-threaded?

box before after slowdown
BT 0.40 1.38 0.98
lab1 0.21 1.12 0.91
lab1.5 0.16 0.68 0.53
lab2 0.23 1.32 1.09
lab3 0.36 1.79 1.43

...Will this cause the same damage to L3 cache?

CompStatsThread

CompStatsThread results?

Saving < 0.2s, except for very few threads on most machines.

yakra commented 4 years ago

The results from lab3 aren't in yet, but this appears to be a dead end. We have very slight gains in CompStatsThread, and actually pretty good gains in ConcAugThread, but they're outweighed by the extra time writing concurrencies.log. Costs include dereferencing a bunch of pointers (FWIW), inserting a greater number of items into the ofstream, and (probably) the big one, creating string objects for HighwaySegments. Not constructing strings for concurrencies.log lines, we're faster single-threaded. But with threaded string construction, speed scales up. End result, all machines are slower with >1 thread.