hohav / peppi

Rust parser for Slippi SSBM replay files
MIT License
37 stars 9 forks source link

Use Vec::resize instead of Vec::push to save on allocations #24

Closed NickCondron closed 1 year ago

NickCondron commented 1 year ago

This is a small performance win we get from avoiding repeated Vec allocations. I attached some flamegraphs -- they are from my project repo, but you can ignore that part. I ran on ~1.8k replays as a test. If you download them to your computer and open in your browser you will get a better interactive version of the svg (idk why github's hosting removes that functionality). Ideally we'd have proper benchmarks... From my testing this provides a ~2.5% speedup.

Before: flamegraph-before

After: flamegraph-resize

hohav commented 1 year ago

Nice! Did your data set contain only 1v1 games, or 2v2 as well? I'd expect the performance benefit to increase with the number of characters (where Nana counts as a character), and I'm curious whether that guess is correct.

Do you have thoughts on creating a benchmark suite? I have plenty of my own replays, but they'd be highly non-representative since I almost exclusively play one character and never in 2v2s.

NickCondron commented 1 year ago

Nice! Did your data set contain only 1v1 games, or 2v2 as well? I'd expect the performance benefit to increase with the number of characters (where Nana counts as a character), and I'm curious whether that guess is correct.

Do you have thoughts on creating a benchmark suite? I have plenty of my own replays, but they'd be highly non-representative since I almost exclusively play one character and never in 2v2s.

Is was all 1v1. Yea this work made me realize a benchmark suite would be a good idea. I don't yet know the best way to set that up. I might try to tackle that. We could also compare peppi to other parsers. I think that's also good advertisement haha!

NickCondron commented 1 year ago

Here's how this looks with my Iai benchmarks going from main to this PR. Seems mixed maybe small win for the new changes.

hbox_llod_timeout_g8_into_game
  Instructions:           253181906 (-0.987175%)
  L1 Accesses:            365123046 (-1.521782%)
  L2 Accesses:               126810 (-60.98670%)
  RAM Accesses:             1236906 (+14.36790%)
  Estimated Cycles:       409048806 (-0.291222%)

ics_ditto_into_game
  Instructions:           163897904 (-1.814596%)
  L1 Accesses:            239742852 (-3.156668%)
  L2 Accesses:               159286 (-49.22055%)
  RAM Accesses:              706126 (+13.30012%)
  Estimated Cycles:       265253692 (-2.098391%)

long_pause_into_game
  Instructions:           852772180 (-0.013898%)
  L1 Accesses:           1235515052 (-0.053340%)
  L2 Accesses:              1368694 (-12.64443%)
  RAM Accesses:             3853874 (+5.053529%)
  Estimated Cycles:      1377244112 (+0.352566%)

mango_zain_netplay_into_game
  Instructions:           142802716 (-1.249649%)
  L1 Accesses:            208421715 (-1.941152%)
  L2 Accesses:               125211 (-61.87798%)
  RAM Accesses:              705634 (+30.99001%)
  Estimated Cycles:       233744960 (+0.300749%)

old_ver_thegang_into_game
  Instructions:           105399381 (-2.986701%)
  L1 Accesses:            155049776 (-4.473304%)
  L2 Accesses:               100441 (-67.46049%)
  RAM Accesses:              700239 (+28.66057%)
  Estimated Cycles:       180060346 (-1.553992%)

short_game_tbh10_into_game
  Instructions:             1145792 (-2.077429%)
  L1 Accesses:              1740241 (-1.831215%)
  L2 Accesses:                 5219 (+0.752896%)
  RAM Accesses:                5760 (-1.285347%)
  Estimated Cycles:         1967936 (-1.742137%)
NickCondron commented 1 year ago

I think the collect logic needs a rework, so this isn't important enough to bother.