ianprime0509 / zig-xml

XML parser for Zig
http://ianjohnson.dev/zig-xml/
BSD Zero Clause License
17 stars 4 forks source link

perf: move data out of `Scanner.Token` #26

Closed ianprime0509 closed 1 year ago

ianprime0509 commented 1 year ago

By storing the token data in a separate Scanner field and having Token be merely the token type, we can avoid a decent amount of copying when tokens are passed around. This leads to considerable speedups for the TokenReader and Reader benchmarks (the Scanner benchmark is slightly slower, but that probably has more to do with how that particular benchmark is written, since the token data was previously discarded).

Benchmark 1 (120 runs): zig-out/bin-old/scanner Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          41.6ms ±  470us    40.7ms … 43.5ms          1 ( 1%)        0%
  peak_rss           7.27MB ± 88.0KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          152M  ±  839K      151M  …  158M           3 ( 3%)        0%
  instructions        472M  ± 20.8       472M  …  472M           0 ( 0%)        0%
  cache_references    270K  ±  625K      206K  … 7.03M          10 ( 8%)        0%
  cache_misses       7.95K  ±  260      7.61K  … 9.81K           3 ( 3%)        0%
  branch_misses       511K  ±  631       510K  …  512K          18 (15%)        0%
Benchmark 2 (116 runs): zig-out/bin/scanner Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          43.0ms ±  452us    41.9ms … 44.4ms          4 ( 3%)        💩+  3.3% ±  0.3%
  peak_rss           7.28MB ± 77.9KB    7.08MB … 7.34MB          0 ( 0%)          +  0.2% ±  0.3%
  cpu_cycles          158M  ±  694K      156M  …  159M           0 ( 0%)        💩+  4.0% ±  0.1%
  instructions        527M  ± 19.4       527M  …  527M          27 (23%)        💩+ 11.7% ±  0.0%
  cache_references    234K  ±  265K      207K  … 3.06M          10 ( 9%)          - 13.5% ± 45.6%
  cache_misses       7.93K  ±  435      7.49K  … 11.8K           5 ( 4%)          -  0.3% ±  1.1%
  branch_misses       514K  ±  335       513K  …  515K           1 ( 1%)          +  0.7% ±  0.0%
Benchmark 1 (44 runs): zig-out/bin-old/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           116ms ±  631us     115ms …  117ms          0 ( 0%)        0%
  peak_rss           7.30MB ± 59.0KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          462M  ± 1.91M      459M  …  466M           0 ( 0%)        0%
  instructions       1.14G  ± 21.9      1.14G  … 1.14G           0 ( 0%)        0%
  cache_references    233K  ± 6.77K      226K  …  253K           3 ( 7%)        0%
  cache_misses       9.69K  ± 1.48K     8.05K  … 13.6K           0 ( 0%)        0%
  branch_misses       815K  ± 1.16K      813K  …  817K           0 ( 0%)        0%
Benchmark 2 (72 runs): zig-out/bin/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          70.2ms ±  782us    68.9ms … 75.3ms          2 ( 3%)        ⚡- 39.4% ±  0.2%
  peak_rss           7.29MB ± 63.4KB    7.21MB … 7.34MB          0 ( 0%)          -  0.2% ±  0.3%
  cpu_cycles          271M  ± 2.75M      268M  …  291M           7 (10%)        ⚡- 41.3% ±  0.2%
  instructions        885M  ± 19.2       885M  …  885M          17 (24%)        ⚡- 22.6% ±  0.0%
  cache_references    224K  ± 7.03K      219K  …  263K           7 (10%)        ⚡-  3.9% ±  1.1%
  cache_misses       8.32K  ±  909      7.80K  … 14.4K           6 ( 8%)        ⚡- 14.1% ±  4.5%
  branch_misses       671K  ± 42.3K      664K  … 1.03M           3 ( 4%)        ⚡- 17.6% ±  1.6%
Benchmark 1 (35 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           145ms ±  857us     143ms …  148ms          2 ( 6%)        0%
  peak_rss           7.29MB ± 65.1KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          582M  ± 3.06M      578M  …  596M           1 ( 3%)        0%
  instructions       1.38G  ± 24.7      1.38G  … 1.38G           0 ( 0%)        0%
  cache_references    758K  ±  196K      513K  … 1.59M           2 ( 6%)        0%
  cache_misses       14.3K  ± 6.84K     11.4K  … 49.2K           4 (11%)        0%
  branch_misses      1.06M  ± 14.0K     1.05M  … 1.11M           3 ( 9%)        0%
Benchmark 2 (48 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           105ms ± 1.55ms     104ms …  113ms          1 ( 2%)        ⚡- 27.2% ±  0.4%
  peak_rss           7.27MB ± 93.6KB    7.08MB … 7.34MB          0 ( 0%)          -  0.2% ±  0.5%
  cpu_cycles          419M  ± 6.29M      414M  …  450M           1 ( 2%)        ⚡- 28.1% ±  0.4%
  instructions       1.13G  ± 19.6      1.13G  … 1.13G          11 (23%)        ⚡- 18.1% ±  0.0%
  cache_references    575K  ± 59.7K      490K  …  797K           1 ( 2%)        ⚡- 24.2% ±  7.9%
  cache_misses       12.5K  ±  876      11.4K  … 15.3K           5 (10%)          - 12.3% ± 13.9%
  branch_misses      1.07M  ± 4.22K     1.07M  … 1.09M           8 (17%)          +  1.2% ±  0.4%