Clarify a misconception on NaN-boxing of pointers

Statement in question:

"But in practice, at least on the architectures I’ve tested, everything above the 48th bit in a pointer is always zero."

A minor issue, but this misconception and the way pointers are currently implemented MAY:

Cause a VM-crashing bug
Limit available virtual address space to only half of what the hardware supports
Not work as intended if deployed on more exotic kernels or on bare-metal

Due to the success and popularity of the book, I think it would help to provide a bit more clarity so that this misconception does not become widespread.

The problem is caused by canonical addressing, which is where the 16 most significant bits of a pointer should be a copy of the 47th bit. That is, pointers should be < 0x0000800000000000 or >= 0xffff800000000000. On amd64 in particular, this is a requirement and attempting to dereference a non-canonical pointer will cause a runtime exception. The pointer 0x0000800000000000 is non-canonical, but it is accepted as valid by the current implementation described in the book.

The reason pointers would always be <= 0x00007fffffffffff in testing is owed to the design of the operating system and not the hardware. Mainstream kernels reserve the higher half for their own memory, and allow lower half to be used freely for user-mode applications.

This is illustrated and described in more detail on the x86-64 page on wikipedia.

A trivial fix for the book, which should not change the semantics of what is implemented, would be to also mask bit 47 in the pointer in addition to the sign and qNaN bits if the platform is AMD64. No other changes would be necessary. It would strictly limit the virtual address space to half but this isn't an issue on a mainstream OS if the VM does not access any kernel memory.

#define HIGHER_HALF_BIT ((uint64_t)0x0000800000000000)
#define AS_OBJ(value) \
    ((Obj*)(uintptr_t)((value) & ~(SIGN_BIT | QNAN | HIGHER_HALF_BIT)))

The more comprehensive fix, which would enable full address space utilization, would be to perform a shift-left by 16 bits, followed by an arithmetic shift right by 16 bits. If bit 47 bit is zero, this will leave bits 48-63 as zero, and if bit 48 is 1, this will leave bits 48-63 as 1.

(0x00007FFFFFFFFFFFL << 16) >> 16; //=0x00007FFFFFFFFFFFL
(0x0000800000000000L << 16) >> 16; //=0xFFFF800000000000L

A C compiler should emit a shift-arithmetic-right instruction for signed integers and a shift-logical-right for unsigned integers, so we need to cast the Value to a signed integer first. (Before or after shifting left does not matter because SAL and SHL are the same operation).

#define AS_OBJ(value) \
    ((Obj*)(uintptr_t)(uint64_t)((int64_t)(value << 16) >> 16))

Fortunately, this should have similar cost to the former masking strategy. The masking strategy should require two instructions (because and cannot take an imm64 operand, but mov can).

mov rN, 0x00007fffffffffff
and rax, rN

The shifting strategy should only require two instructions, and one fewer register.

shl rax, 16
sar rax, 16

Each of these instructions has the same latency and throughput.

It would also be necessary to mask the most-significant 16 bits prior to applying the boxing though.

#define OBJ_VAL(obj) \
    (Value)(SIGN_BIT | QNAN | ((uint64_t)(uintptr_t)(obj) & 0x0000FFFFFFFFFFFF))

You could implement this in a way which would require no additional instructions.

Kind Regards, Mark

munificent / craftinginterpreters

Clarify a misconception on NaN-boxing of pointers #1035