yakra / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
0 stars 0 forks source link

`TMBitset` checklist #245

Closed yakra closed 1 year ago

yakra commented 1 year ago

Benchmark:

Try out:

Kinda both:

Clean up:

yakra commented 1 year ago

find crashes via

for file in sulogs/2023-04-06/siteupdate_{a423,4448,8fba,8d55,7828,96e9,2d1b,5c76}*; do
  echo -e "$file\t" `tail -n 1 $file \
  | sed 's~\[[0-9.]*\]~~' \
  | tr -d .`; done \
| grep -v 'Total run time:' \
| sed -r 's~.*siteupdate_(....)-.*~\1~' \
| uniq
yakra commented 1 year ago

dead branches

yakra commented 1 year ago

commits

c incl? descr comments
a5a7 yes contig RAM Prerequisite for everything else
ecff yes TMB 1st draft The main point of all this :)
d704 yes t_l via TMB Small speedup. Saves RAM. Potential for improvements via iteration and |= optimizations.
a009 yes t_l list->vec Not slower. The Right Thing To Do. Saves RAM. Locality. Is index used in future commits? Only in eaa6, which won't be used as it hurts performance.
What if traveler_lists is nixed altogether, and we just iterate traveler_set one more time at the end? Or, what if, instead of doing reallocations/copies, we just reserve(TravelerList::allusers.size()) after declaring?
Re-examine nixing in future.
91a0 yes !extr include Housekeeping. Rebase earlier?
7ce7 yes userlog warn Unrelated, and doesn't appear to affect performance any, fnord. Rebase earlier?
f977 no t_l as ref Pass a bool instead; see 4504.
Nix extraneous traveler_lists creation for master graph.
eaa6 no nix tl::tn Hurts performance. See 9fdc et al instead.
4e05 no avoid iterate Child of eaa6. See 96e9 even though it's the same diff. :P
6922 no ctor / init Any changes to performance? See next commit. No: See 7415.
5c76 no integral type Userlogs slightly slower? Test again once dust settles? No: See 4a64.
b4ab no names & order Arg order just makes sense. Name changes are more related to prev commit. No: See c896.
e61a no ++u No change to binary. Done on its own in some future commits, including 47a7.
a423 yes branching CBT Faster userlogs. Rebase earlier?
Maybe try again once HighwaySegments are stored sequentially?
4448 no short unit branch; affects \|= only. Prefer flavors with iterator optimization.
8fba no int unit branch; affects \|= only. Prefer flavors with iterator optimization.
8d55 no long unit branch; affects \|= only. Prefer flavors with iterator optimization.
7828 no no-branch CBC Hurts performance
96e9 yes avoid iterate 4e05 cherrypicked. Reorder additions in for loop; make it look pretty.
2d1b yes inline mv&e Slight speedup, simpler prototypes & includes
f28c no long bugfix unit branch; affects \|= only. Prefer flavors with iterator optimization.
953a no ! unit param Not strictly necessary for pointer-punning \|=, but cleans things up.
No: Different units may be useful for scenarios other than clinched_by. E.g., uint64_t may work well for HGVertex subgraph membership.
03ec no punning \|= Evaluate WRT straight-up unit \|=. Prefer flavors with iterator optimization; 3 are simplified for 16/32-bit units; see below.
7958 no SimItOpt1 SimItOpt2 outperforms this.
3910 no SimItOpt2 32 > 16 > 8-bit for iterating clinched_by. Additionally, allows for better \|= optimization.
1be1 no SIO1 short SimItOpt2 outperforms this.
87e4 no SIO1 int "
2cbb no SIO1 long "
9280 no SIO2 short 32 > 16 > 8-bit for iterating clinched_by. Additionally, allows for better \|= optimization.
9347 no sign bit fix "
acc9 yes unsigned int "
8269 no unsigned long Performs worse than 8/16/32-bit everywhere except epoch.
f8cf no ulong bugfix "
f749 no ComItOpt Evaluate against SimItOpt. All tasks. ~Try with per-unit \|=.~ <--Never mind.
No: Underperforms SIO2 on all lab machines (but not BiggaTomato & epoch though those aren't the target). KISS principle.
9fdc no trav_nums May not do much on its own. Try again after other speedups are in place. See 8be1.
bc9d no TMB it idx Pro: No arithmetic for clinchedby_code. Con: Still adding b.array+index everywhere else (deref).
0b69 no iterator ptr Pro: No arithmetic for dereferencing. Con: CBC subtracts out.
Use 4db4 instead.
^ ^^ ^ ^^ Both are superior to 9fdc, which adds for everything, then subtracts out for CBC.
Test both sans 9fdc. Which gets called more?
• 69% clinched_by in clinchedby_code @ 36,193,165, which wants indices.
• 31% in everything else @ 16,024,183, which want pointers.
(8,000,360 clinched_by in stats & SQL + 23,463 traveler_set in MV&E)
So on paper, bc9d wins. But in practice we're within margin of error, with 0b69 faster on the 4 non-Ubuntu (coincidence?) machines.
If foregoing or at least deferring 9fdc because no clear benefit, it makes sense to use 0b69.
ec4e no 16-bit it opt Wins on epoch, BT & lab1; lags behind SimItOpt2 everywhere else, even before accounting for lookup array setup. Constexpr would be expensive for program & code size. Ugly changes to code, a namespace I wouldn't otherwise need.
16bb no [bits >> 1] Only needed for ComItOpt, which looks like it's not going to pan out. Only performs better on lab2, so within margin of error at best.
2d94 no ternaries 1st place on lab2 by 0.5 ms & lab3 by 0.7; lags behind SimItOpt2 everywhere else except BT, where it's 2nd place behind ec4e, and more arguably epoch, where it's slower than 8 & 16-bit, but faster than 32-bit (which looks like it'll be the selected alternative). But again, BT & epoch aren't target machines. Keep it simple.
fc17 no punning SIO2 Bigger is better.
72e2 no 64-32-16 pun "
c310 no 64-16-bit pun "
47a7 yes 64-32-bit pun Bigger is better. Fewest branches & loops. 32-bit is fastest for iteration too.
4db4 yes iterator ptr Faster.
4504 yes "0" Rebase earlier
7f87 yes trav Better on paper; != op replaced with just sending an existing variable. Rename variable to is_traveled for readability.
05d7 no reserve No definitive diff; no-build may be faster.
8be1 ... TL::nums No apparent diffs. Revisit when more RAM bandwidth optimizations are in place?
c896 yes arg order Measures faster, but probably just noise.
7415 yes ctor / init Keep the initialization part; no diffs to binary from that. Array start & data pointer go from 8 to 24 B apart, increasing probability of being on different cache lines. That's got to explain the drop in UserLog performance. Therefore, reorder to get them consecutive again; see 01d6.
4a64 yes integral type This time, no change to binary from 7415.
6da9 no alignas(32) Increases sizeof(HighwaySegment) from 112 to 128. Does help UserLog performance on all machines, but hurts most other tasks on most machines. Combined, still a slight advantage overall but future commits do better...
01d6 yes reorder membr start and data adjacent again, restoring old same/different cache line probability. Data occasionally on different cache lines appears to be outweighed by not increasing sizeof(HighwaySegment), except for bandwidth-constrained machines BiggaTomato, lab1 & bsdlab. 01e7 makes this a non-issue...
01e7 yes cbt_index Not only avoids reading start (potentially from another cache line), avoids redundant recalculation of the index. A clear advantage over all prev commits on all Linux boxes.
48c8 yes part add_idx add_clinched_by takes index, calculated once per chopped route, in store_traveled_segments.
28af no local_index No clear benefit; more diffs. Keep it simple.
c8b9 yes const fold readability
adcf no 1 index/trav No clear benefit; more diffs. Keep it simple.
6634 yes whitespace
9536 yes constexpr no diff to binary on BiggaTomato
ecef yes ST aug index
yakra commented 1 year ago

Functions, operators, etc.

yakra commented 1 year ago

HTML (resolved)

yakra commented 1 year ago

HTML

yakra commented 1 year ago

hidden/deleted branches

comm branch DoWhat comment
ecef ConstFold delete TMBPreAlpha FFWDed
adcf ElStacko hidden
28af add_index hidden
89bd cbt_index delete TMBPreAlpha FFWDed
01d6 reorder delete merged
6da9 alignas hidden
8be1 thread_local hidden re-examine in future
05d7 reserve hidden
660c [stash] hidden meta-iteration (CompStats)
6a83 [stash] hidden meta-iteration (traveler names)
7f7d nixtl hidden re-examine in future
47a7 64-32 delete parent of TMBPreAlpha & nixtl
c310 64-16 hidden
72e2 64-32-16 delete parent of 64-16
fc17 SIO2pun hidden
2d94 ternaries hidden
16bb bit1 delete parent of ternaries
ec4e ComItOpt16 delete parent of ternaries
0b69 return_p hidden
bc9d idx hidden
9fdc nixon delete parent of return_p
f749 ComItOpt delete parent of ternaries
f8cf SimItOpt2 hidden
2cbb SimItOpt hidden
03ec pun delete parent of ternaries
54fb issue588 hidden use this
f28c unit hidden
2d1b inline delete merged
96e9 switch delete merged
7828 cbycode hidden
a423 cbt_br delete merged
e61a TMBitset hidden Nix TL::tn dropped; the rest redone manually downstream
yakra commented 1 year ago

Alpha ToDo list

Solo items

C Descr Comments
54fb issue 588
91a0 !extr include
7ce7 userlog warn
96e9 avoid iterate
ToDo Timestamp for first task (Get list of travelers in the system). Python Too.
ToDo Constexpr the static edge format constants.
v Extraneous else after continue; parens can be clarified
^ extraneous visibility == 1 check just before that

traveled_tmg_line & traveler_lists cleanup

C Descr Comments
4504 "0" HGEdge, HighwaySegment. HG.cpp: 2 tmg_line, 3 size->travnum
7f87 trav

contig RAM group

C Descr Comments
ToDo Get TMArray sorted. TMArray means threaded construction in place (via placement new) without separate read_list function.
a5a7 contig RAM
ToDo fix comment
ToDo TravelerList dtor, to delete traveler_num

TMB group

C Descr Comments
ecff TMB 1st draft Does ctor init vars in order? Don't include siteupdate.cpp changes; see below.
Nope ~init segments with TravelerList::ids.size()~ Never mind. TMArray<TravelerList>::size.
a423 branching CBT
acc9 unsigned int
47a7 64-32-bit pun
4db4 iterator ptr
c896 arg order
7415 ctor / init
4a64 integral type
01d6 reorder membr
01e7 cbt_index
48c8 part add_idx
c8b9 const fold
6634 whitespace
9536 constexpr
ecef ST aug index
done ~nix () and add_value until needed~ Done as part of c8b9
ToDo Cast (unit)1 before << ...lest the unsigned long bug make a reappearance. See f8cf on SimItOpt2 branch.
ToDo Review TMBitset variable names Also reduce arithmetic in iterator construction.
ToDo Comment for != and |= operators
ToDo segments SQL table: only iterate clinched_by for active/preview systems
ToDo uint8_tetc.
ToDo As of ecff, cmath no longer explicitly needed in HighwayGraph.cpp
ToDo Template
specialization
ToDo Explicit instantiation?

t_l via TMB group

C Descr Comments
d704 t_l via TMB
2d1b inline mv&e Delete that extra line of whitespace from HighwayGraph.h as in 4504
ToDo Don't include TMBitset
4504 "0" GetSubData. HG.cpp: assign traveler numbers
a009 t_l list->vec
yakra commented 1 year ago

partial specialization experiments deleted stash @ 484be5074e68f31db790cc53b34cf2494f6ff114

yakra commented 1 year ago

Benchmark commits:

yakra commented 1 year ago

List

Conc Stat User

Comp Graph Graph_Linux Total ErCh ErCh_zoom

yakra commented 6 months ago

deleted branches (BiggaTomato)

g_db