ianprime0509 / zig-xml

XML parser for Zig
http://ianjohnson.dev/zig-xml/
BSD Zero Clause License
17 stars 4 forks source link

perf: rework decoder interface #22

Closed ianprime0509 closed 1 year ago

ianprime0509 commented 1 year ago

The updated interface decodes codepoints directly from a reader rather than being implemented as a state machine. This turns out to be considerably more efficient than the previous implementation, with around 25% improvement on the token_reader and reader benchmarks:

Benchmark 1 (27 runs): zig-out/bin-old/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           188ms ± 14.5ms     168ms …  205ms          0 ( 0%)        0%
  peak_rss           7.31MB ± 58.5KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          688M  ± 4.20M      684M  …  706M           1 ( 4%)        0%
  instructions       1.19G  ± 29.4      1.19G  … 1.19G           0 ( 0%)        0%
  cache_references    412K  ±  763K      239K  … 4.21M           2 ( 7%)        0%
  cache_misses       10.0K  ± 7.40K     7.90K  … 46.8K           2 ( 7%)        0%
  branch_misses       814K  ± 1.37K      813K  …  821K           1 ( 4%)        0%
Benchmark 2 (37 runs): zig-out/bin/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           136ms ± 13.8ms     115ms …  147ms          0 ( 0%)        ⚡- 27.7% ±  3.8%
  peak_rss           7.31MB ± 54.7KB    7.21MB … 7.34MB          8 (22%)          +  0.1% ±  0.4%
  cpu_cycles          462M  ± 1.87M      459M  …  466M           0 ( 0%)        ⚡- 32.8% ±  0.2%
  instructions       1.14G  ± 26.6      1.14G  … 1.14G           0 ( 0%)        ⚡-  4.1% ±  0.0%
  cache_references    236K  ± 4.86K      227K  …  244K           0 ( 0%)          - 42.7% ± 60.7%
  cache_misses       9.40K  ± 1.25K     7.88K  … 11.5K           0 ( 0%)          -  6.5% ± 24.6%
  branch_misses       815K  ± 1.01K      813K  …  817K           0 ( 0%)          +  0.1% ±  0.1%
Benchmark 1 (23 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           225ms ± 14.2ms     199ms …  249ms          0 ( 0%)        0%
  peak_rss           7.25MB ±  100KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          823M  ± 12.2M      813M  …  847M           0 ( 0%)        0%
  instructions       1.43G  ± 23.0      1.43G  … 1.43G           0 ( 0%)        0%
  cache_references    757K  ±  129K      635K  … 1.07M           1 ( 4%)        0%
  cache_misses       13.7K  ± 1.18K     12.5K  … 17.2K           2 ( 9%)        0%
  branch_misses      1.43M  ± 3.35K     1.42M  … 1.43M           0 ( 0%)        0%
Benchmark 2 (31 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           166ms ± 13.9ms     144ms …  175ms          0 ( 0%)        ⚡- 26.5% ±  3.4%
  peak_rss           7.27MB ± 81.8KB    7.08MB … 7.34MB          0 ( 0%)          +  0.3% ±  0.7%
  cpu_cycles          581M  ± 1.54M      579M  …  584M           0 ( 0%)        ⚡- 29.4% ±  0.5%
  instructions       1.38G  ± 16.0      1.38G  … 1.38G           9 (29%)        ⚡-  3.8% ±  0.0%
  cache_references    715K  ±  219K      563K  … 1.71M           3 (10%)          -  5.5% ± 13.6%
  cache_misses       13.5K  ± 1.31K     11.4K  … 16.5K           2 ( 6%)          -  1.2% ±  5.1%
  branch_misses      1.07M  ± 20.3K     1.05M  … 1.11M           5 (16%)        ⚡- 25.3% ±  0.6%