munificent / craftinginterpreters

Repository for the book "Crafting Interpreters"
http://www.craftinginterpreters.com/
Other
8.43k stars 1.01k forks source link

Nan-boxing and 64bit integer type? #1067

Open HallofFamer opened 2 years ago

HallofFamer commented 2 years ago

I plan to split the number type from CLox into int and float, in which int is 64bit long integer and float is 64 bit double-precision floating point number. The latter will be the same as the original representation of Lox's number, but the former gives me a headache. I've read this article and the author suggests that due to the way Nan-boxing works, it is not possible to store 64 bit long integer native type:

https://leonardschuetz.ch/blog/nan-boxing/

So this means I will have to use int32_t instead of int64_t then? Or is there a way that I can somehow manage to get int64_t work with Nan-boxing?

HallofFamer commented 1 year ago

Seems I found a way to use pointer tagging and 63bit integer, with 1 bit reserved as a flag to tell if it is an integer or not. I think smalltalk, Ruby and OCaml has such 31 or 63 bit integers, so it is something that has done before.

mcfriend99 commented 1 year ago

How??

HallofFamer commented 1 year ago

How??

To implement pointer tagging with 63 bit integers, you will just have unboxed integers, while everything else(null, boolean, floats, etc) will be boxed as pointer. The last bit is reserved as a flag that will tell if it is an integer(1) or a pointer(0). This way, you can have the higher 63 bits as the significant bits for integers. It will make integer arithmetics a little more complicated, as this article from OCaml shows:

https://blog.janestreet.com/what-is-gained-and-lost-with-63-bit-integers/

The downside is, you will have to box some atomic values like null, true, false and floats, while Nan boxing allows you to have unboxed 64 bit floating point numbers for "free"(without performance penalty). If you have a statically typed language though, you only pay the penalty for floating point numbers since you can check if a value is null, true or false statically. For a dynamically typed language, it may be worth considering 62 bits integers which allows you to have unboxed boolean values too.

But anyway, you will lose fast floating point arithmetic, as with pointer tagging it is impossible to have both unboxed integers and unboxed floating point values. So it is a tradeoff you will have to decide, whether integer arithmetic or floating point arithmetic is more common for your usecase.

RevengerWizard commented 1 year ago

I was thinking the same about adding 64 bit ints but I'm still not really sure. Maybe another solution could be to check if the number contains an actual int or treating the int as an object, though it would be slower.

HallofFamer commented 1 year ago

Yeah you wouldnt want to store it as an object pointer, it will be much slower, even slower than the original tagged union solution before NAN boxing optimization. I heard that Pharo smalltalk actually managed to mask both small integers and 32 bit floating point numbers, and by doing this their integers are 30 bit. A weird choice indeed, but I suppose it was largely due to supporting legacy 32 bit system. But perhaps, it is actually feasible to have unboxed integers and floating point numbers, provided that you are willing to go for something like 62 or even 61 bit integers. It is not theoretically possible to have unboxed 64 bit integer though, since you wont be able to tell whether it is an int or object.

Edit: Turns out Pharo Smalltalk's integers are actually 29 bits as they also mask characters, but you get the idea.