Closed jonathanhogg closed 8 months ago
Why is this worth doing? Where I use symbols as state keys in physics systems I tend to combine these with numeric particle identifiers. This immediately turns the vector into a tuple of Python objects and everything to do with them becomes slower. Hashing is the key thing for state keys. For a very heavy example, see:
https://github.com/jonathanhogg/flitter-examples/blob/main/genuary2024/day1.fl
This is significantly faster if all of the symbols are replaced with numbers.
Having just done a trial implementation, it makes more sense to go with my original though of having the symbol numbers be large negative numbers based on a part of the string hash. Directly interpreting hashes as doubles ends up with non-integer numbers and these will be truncated for hash calculations in a bunch of places.
Here's a comparison of running the fireworks example above with current main:
17:28:33.090 13989:.engine.control | INFO: 59.6fps; 3.5/ 8.8/ 1.5ms (run/render/sys); perf 1.12
17:28:38.115 13989:.engine.control | INFO: 45.6fps; 7.0/13.0/ 1.7ms (run/render/sys); perf 0.50
17:28:43.124 13989:.engine.control | INFO: 30.7fps; 10.2/19.5/ 2.8ms (run/render/sys); perf 0.50
17:28:48.133 13989:.engine.control | INFO: 32.1fps; 9.6/18.8/ 2.7ms (run/render/sys); perf 0.50
17:28:53.147 13989:.engine.control | INFO: 30.9fps; 9.8/19.7/ 2.8ms (run/render/sys); perf 0.50
17:28:58.164 13989:.engine.control | INFO: 26.7fps; 11.4/22.8/ 3.2ms (run/render/sys); perf 0.50
17:29:03.167 13989:.engine.control | INFO: 29.8fps; 10.3/20.2/ 2.9ms (run/render/sys); perf 0.50
17:29:08.181 13989:.engine.control | INFO: 33.9fps; 9.0/17.9/ 2.6ms (run/render/sys); perf 0.50
17:29:13.191 13989:.engine.control | INFO: 34.5fps; 8.8/17.5/ 2.5ms (run/render/sys); perf 0.50
17:29:18.203 13989:.engine.control | INFO: 29.7fps; 10.2/20.4/ 3.0ms (run/render/sys); perf 0.50
17:29:23.206 13989:.engine.control | INFO: 26.8fps; 11.4/22.5/ 3.3ms (run/render/sys); perf 0.50
17:29:28.228 13989:.engine.control | INFO: 26.7fps; 11.4/22.6/ 3.3ms (run/render/sys); perf 0.50
17:29:33.249 13989:.engine.control | INFO: 33.1fps; 9.2/18.3/ 2.7ms (run/render/sys); perf 0.50
and with this new branch:
17:23:58.980 13829:.engine.control | INFO: 59.6fps; 3.0/10.3/ 1.6ms (run/render/sys); perf 0.97
17:24:03.988 13829:.engine.control | INFO: 38.3fps; 7.5/16.4/ 2.2ms (run/render/sys); perf 0.50
17:24:08.998 13829:.engine.control | INFO: 34.7fps; 8.1/18.2/ 2.4ms (run/render/sys); perf 0.50
17:24:14.006 13829:.engine.control | INFO: 29.9fps; 9.4/21.2/ 2.8ms (run/render/sys); perf 0.50
17:24:19.016 13829:.engine.control | INFO: 35.5fps; 7.7/18.0/ 2.3ms (run/render/sys); perf 0.50
17:24:24.040 13829:.engine.control | INFO: 39.8fps; 6.9/16.0/ 2.1ms (run/render/sys); perf 0.50
17:24:29.043 13829:.engine.control | INFO: 40.4fps; 6.8/15.8/ 2.1ms (run/render/sys); perf 0.50
17:24:34.044 13829:.engine.control | INFO: 35.0fps; 7.9/18.2/ 2.4ms (run/render/sys); perf 0.50
17:24:39.072 13829:.engine.control | INFO: 30.0fps; 9.3/21.1/ 2.8ms (run/render/sys); perf 0.50
17:24:44.078 13829:.engine.control | INFO: 34.6fps; 7.9/18.5/ 2.4ms (run/render/sys); perf 0.50
17:24:49.101 13829:.engine.control | INFO: 36.6fps; 7.5/17.4/ 2.3ms (run/render/sys); perf 0.50
17:24:54.121 13829:.engine.control | INFO: 34.5fps; 8.0/18.5/ 2.4ms (run/render/sys); perf 0.50
17:24:59.140 13829:.engine.control | INFO: 30.5fps; 9.1/20.9/ 2.8ms (run/render/sys); perf 0.50
Ignoring the first few seconds where there are no fireworks, the average for 60 seconds are respectively 31.7fps; 9.86ms run and 35fps; 8ms run, which is pretty decent for such a simple change to the model – particularly shaving a whole 2ms (19%) off the program execution time.
At the moment a symbol, like
:foo
, is just a synonym of the equivalent string,"foo"
. However, what if we converted symbols into numbers instead?I could have a symbol table that allocates new large/unusual numbers for each new symbol and maintains mappings both ways between these. The parser would replace all symbols with simple number vectors. Then, wherever I'm currently expecting symbol-like strings, e.g, the values of enumerated attributes like
composite=
, I could accept a number if it maps to a string through this table. For compatibility I could haveVector.as_string()
do this mapping automagically.This would have the advantage that state keys incorporating symbols and numbers would become simple numeric vectors, which are much faster to manipulate and – importantly – hash.
The downside is that the internal state would be less representable, but I don't think there are many places where I currently print out state keys. The smartest thing to do might be for
Vector.__repr__()
to look up all numbers in the symbol table in case they match a symbol, then print out the symbol instead.I'd want to ensure that the symbol table uses numbers that are unlikely to collide with normal number usage in a program. Maybe large integer negatives would be smart. Ideally you'd want this symbol table mapping to be stable. So alternatively it could just be the 64-bit string hash (as computed normally in
Vector.hash()
) interpreted as adouble
? This would surely turn up numbers that are unlikely to collide with normal usage of numbers?