rust-hosted-langs / runtimes-WG

Working group focused on language runtimes - implementing GC and concurrency in safe Rust APIs
33 stars 0 forks source link

Everything is a float and NaN packing #9

Open madmalik opened 6 years ago

madmalik commented 6 years ago

A lot of scripting languages choose IEEE 754 doubles as the only numeric type and then encode all other possible values in the range of redundant NaN values. There is enough room for pointers in there since current 64 bit architectures only use 48 bits.

There are obvious downsides:

In summery, this is an overreaching performance hack and it totally grosses me out.

But, it works surprisingly well in practice:

I‘m not really sure language design should be driven by a performance hack, but its used succesfully im practice. Imo we should at least discuss if should consider such a design.

pliniker commented 6 years ago

I have no problem with this gross performance hack :grinning:

madmalik commented 6 years ago

I have no problem with this gross performance hack 😀

I'm using it right now in a little domain specific language right now. But i'm thinking about dropping boxed pointers in favour of u32 handles into an area. Always expecting that pointers never exceed 48 bits in size don't seem to be that portable and i could get rid of the unsafe bits. The Ease of masking off 32bit values could potentially reduce the performance impact of using handles instead of pointers somewhat.

uazu commented 5 years ago

So you get 52 bits free in a NaN less one value reserved for infinity (assuming actual NaNs are converted to runtime errors). So you could encode a 51-bit integer in there (for actual integers) and a 32-bit handle (for all other values, including larger integers if supported).

madmalik commented 5 years ago

So you could encode a 51-bit integer in there (for actual integers) and a 32-bit handle (for all other values, including larger integers if supported).

Since f64 has a native precise integer range of 52 bits, i don't know if its useful to create an additional integer type. You'd gain the overflow into BigInteger (instead of the non-integer floating point range).

But if there are not other numerical types than float, we could just go ahead with all numerical operations and defer the typechecks of the input into a branch if the result is NaN. If that single branch is predicted correctly, that could speed up numerical operations considerately.

uazu commented 5 years ago

I'm playing around with ideas for a Lua-alike but with type-specific ops (int+int, fp+.fp, str..str). So, I could keep the intermediate values during a calculation and even some local vars in the native representation for the type. The NaN-packing would just be for efficient longer-term storage. I think integers are too important to keep in FP. The reasons I can think of:

I haven't benchmarked these costs, though (except the mental and debugging load, which I do have experience with). My Lua-alike is at the level of fantasy, just a patchwork of ideas I'm building up based on the defensive style I've ended up using in Lua and improvements for various Lua frustrations. It seems to me that Lua makes too much of a sacrifice for surface simplicity sometimes. However this all does depend on the domain your scripting language is oriented towards. Maybe all-FP is fine for that.

madmalik commented 5 years ago

Array indexing requires conversion from FP to integer each time

That wouldn't be a big problem if the language has iterator like constructs and doesn't rely on explicit iteration variables. Explicit integers for indexing could be an implementation detail of the iterator and not exposed to the language.

Whilst bit operations can be hacked on top of the f64 representation (adjusting exponents and then doing the bitop on the mantissas), that's slower than the single machine-op for real integers

That definitely a good reason if you want to have backed in bit operations. Imo they are a niche usecase in my interpretation of a scripting language.

There's the mental cost of using FP for integers [...]

I think thats the best contra argument. But you want to avoid the FP-weirdness, i think i'd be better to not center the data representation around floats.