Everything is a float and NaN packing

madmalik commented 6 years ago

A lot of scripting languages choose IEEE 754 doubles as the only numeric type and then encode all other possible values in the range of redundant NaN values. There is enough room for pointers in there since current 64 bit architectures only use 48 bits.

There are obvious downsides:

Floating point math is hard to get right. Floats for everything can be a footgun. Also people will make fun of our programming language because 1/3 + 2/3 != 1
At least in theory, there can be portability issues
storing pointers cannot be done in safe rust (imo not a big deal, but should be noted)
other values must fit into the unused NaN range. LuaJIT for example uses this approach and therefore cannot implement the i64 datatype Lua 5.3 introduced.

In summery, this is an overreaching performance hack and it totally grosses me out.

But, it works surprisingly well in practice:

floats have a precise integer range that is sufficient in practice and overflow fails gracefully
Having all values register sized has a substantial performance impact
cases where i/u64s, big ints or rationals are absolutely nessesary are rare enough to put them behind a pointer indirection

I‘m not really sure language design should be driven by a performance hack, but its used succesfully im practice. Imo we should at least discuss if should consider such a design.

pliniker commented 6 years ago

I have no problem with this gross performance hack :grinning:

madmalik commented 6 years ago

I have no problem with this gross performance hack 😀

I'm using it right now in a little domain specific language right now. But i'm thinking about dropping boxed pointers in favour of u32 handles into an area. Always expecting that pointers never exceed 48 bits in size don't seem to be that portable and i could get rid of the unsafe bits. The Ease of masking off 32bit values could potentially reduce the performance impact of using handles instead of pointers somewhat.

uazu commented 5 years ago

So you get 52 bits free in a NaN less one value reserved for infinity (assuming actual NaNs are converted to runtime errors). So you could encode a 51-bit integer in there (for actual integers) and a 32-bit handle (for all other values, including larger integers if supported).

madmalik commented 5 years ago

So you could encode a 51-bit integer in there (for actual integers) and a 32-bit handle (for all other values, including larger integers if supported).

Since f64 has a native precise integer range of 52 bits, i don't know if its useful to create an additional integer type. You'd gain the overflow into BigInteger (instead of the non-integer floating point range).

But if there are not other numerical types than float, we could just go ahead with all numerical operations and defer the typechecks of the input into a branch if the result is NaN. If that single branch is predicted correctly, that could speed up numerical operations considerately.

uazu commented 5 years ago

I'm playing around with ideas for a Lua-alike but with type-specific ops (int+int, fp+.fp, str..str). So, I could keep the intermediate values during a calculation and even some local vars in the native representation for the type. The NaN-packing would just be for efficient longer-term storage. I think integers are too important to keep in FP. The reasons I can think of:

Array indexing requires conversion from FP to integer each time
Whilst bit operations can be hacked on top of the f64 representation (adjusting exponents and then doing the bitop on the mantissas), that's slower than the single machine-op for real integers.
There's the mental cost of using FP for integers of not knowing if you've accidentally introduced a fraction, and doing defensive coding to deal with that. True, if you only use + - * // (floor division) then everything will stay as integers, but what if a fraction is passed in a function argument? (In Lua 5.3 you now get a crash if you accidentally format a number with a fraction with %d, so that really makes you realise that you never had your numbers totally under control.)

I haven't benchmarked these costs, though (except the mental and debugging load, which I do have experience with). My Lua-alike is at the level of fantasy, just a patchwork of ideas I'm building up based on the defensive style I've ended up using in Lua and improvements for various Lua frustrations. It seems to me that Lua makes too much of a sacrifice for surface simplicity sometimes. However this all does depend on the domain your scripting language is oriented towards. Maybe all-FP is fine for that.

madmalik commented 5 years ago

Array indexing requires conversion from FP to integer each time

That wouldn't be a big problem if the language has iterator like constructs and doesn't rely on explicit iteration variables. Explicit integers for indexing could be an implementation detail of the iterator and not exposed to the language.

Whilst bit operations can be hacked on top of the f64 representation (adjusting exponents and then doing the bitop on the mantissas), that's slower than the single machine-op for real integers

That definitely a good reason if you want to have backed in bit operations. Imo they are a niche usecase in my interpretation of a scripting language.

There's the mental cost of using FP for integers [...]

I think thats the best contra argument. But you want to avoid the FP-weirdness, i think i'd be better to not center the data representation around floats.

rust-hosted-langs / runtimes-WG

Everything is a float and NaN packing #9