Open yakra opened 1 year ago
3 different flavors for each of the checkboxes above:
Pointer, String, Elements.
For each, 2 variations, keeping augment_list
as a std::list<whatever>
, or converting to a std::vector
:
List, Vector.
Yielding 6 build alternatives to evaluate:
pl, pv, sl, sv, el, ev.
The vector versions performed better across the board, so I added in one more alternative to evaluate, v, which is the same as no-build except for the list->vector conversion.
For all 4 tasks combined, v is a clear winner at all but the lowest thread counts (and of course we want to optimize for more threads) on every machine except BiggaTomato, which has the least cache & slowest RAM. There, it still outperforms no-build across the board. At 4 threads, it's in 3rd place, only 0.05 s behind the winner, sv. Wherever v is not first place, it still beats no-build, no exceptions.
TLDR, v
is our winner.
But wait. Can we still improve on things? Let's eliminate the list options from consideration & take a deeper dive. Considering v, pv, sv, & ev...
HighwaySegment::str()
strings.augment_lists[t]
& print one string. Boom. Done.augment_lists[t]
& print 7 strings, dereferencing 1 pointer along the way.Route::readable_name()
strings.HighwaySegment::str()
strings.ofstream
insertions, dereferencing & string constructon, as well as find a solution that performs well in CongAugThread and minimizes cache misses when computing stats. Can't have all of these things at once though.3 more alternatives:
augment_list
entry
, instead inserting it as a const char*
into the ofstream
.route->readable_name()
in favor of its constituent parts, taking a couple more references along the way.Narrow lead of v2 notwithstanding, I'll just implement v for now, which does need to happen at a minimum, and take another look at this after the region.php rankings bugfix (and maybe sequential TravelerList objects & TMBitset<TravelerList> clinched_by
) change ConcAugThread operation & CompStatsThread iteration.
branches on BiggaTomato | branch | commit |
---|---|---|
y238r2 | abd9d238800126af8c53612bc574ccb942c90349 | |
y238r1 | abd9d238800126af8c53612bc574ccb942c90349 | |
y238v2 | fa2918559cbd24d722ce42f862a03387784a7423 | |
y238v | 525c2b6e2a29c6e1b34e500834164b589ce9e5cb | |
y238ev | 83b06cf2ab00fa2114295178bfcfa4c4c59ec732 | |
y238el | 1aabe3c0df3ef9109be34c35c3e8bf6387eb94e7 | |
y238sv | 1555dce2d38a21510cc01a547a5c7101d672745a | |
y238sl | 52ecc48c34880123b30681d8cd7cb3b7df756e18 | |
y238pv | 0244cea61d2e9331484b0419439e9d94f6a7e319 | |
y238pl | 0abfc10fb0e62fadfbd0c8575f652e16a8c12613 |
https://github.com/yakra/DataProcessing/blob/616425bc2fb5df01778f43c2d06ccbe15ae94816/siteupdate/cplusplus/threads/ConcAugThread.cpp#L20 All those allocations and appends probably get a bit expensive.
std::tuple<TravelerList,HighwaySegment,HighwaySegment>
and insert the individual components into the ofstream, same as in single-threaded operation.HighwaySegment::str()
too expensive now that it's single-threaded?std::tuple<TravelerList,std::string,std::string>
and construct the strings in the multi-threaded bit, orHighwaySegment::str()
into the ofstream