A minor issue, but this misconception and the way pointers are currently implemented MAY:
Cause a VM-crashing bug
Limit available virtual address space to only half of what the hardware supports
Not work as intended if deployed on more exotic kernels or on bare-metal
Due to the success and popularity of the book, I think it would help to provide a bit more clarity so that this misconception does not become widespread.
The problem is caused by canonical addressing, which is where the 16 most significant bits of a pointer should be a copy of the 47th bit. That is, pointers should be < 0x0000800000000000 or >= 0xffff800000000000. On amd64 in particular, this is a requirement and attempting to dereference a non-canonical pointer will cause a runtime exception. The pointer 0x0000800000000000 is non-canonical, but it is accepted as valid by the current implementation described in the book.
The reason pointers would always be <= 0x00007fffffffffff in testing is owed to the design of the operating system and not the hardware. Mainstream kernels reserve the higher half for their own memory, and allow lower half to be used freely for user-mode applications.
This is illustrated and described in more detail on the x86-64 page on wikipedia.
A trivial fix for the book, which should not change the semantics of what is implemented, would be to also mask bit 47 in the pointer in addition to the sign and qNaN bits if the platform is AMD64. No other changes would be necessary. It would strictly limit the virtual address space to half but this isn't an issue on a mainstream OS if the VM does not access any kernel memory.
The more comprehensive fix, which would enable full address space utilization, would be to perform a shift-left by 16 bits, followed by an arithmetic shift right by 16 bits.
If bit 47 bit is zero, this will leave bits 48-63 as zero, and if bit 48 is 1, this will leave bits 48-63 as 1.
A C compiler should emit a shift-arithmetic-right instruction for signed integers and a shift-logical-right for unsigned integers, so we need to cast the Value to a signed integer first. (Before or after shifting left does not matter because SAL and SHL are the same operation).
Fortunately, this should have similar cost to the former masking strategy. The masking strategy should require two instructions (because and cannot take an imm64 operand, but mov can).
mov rN, 0x00007fffffffffff
and rax, rN
The shifting strategy should only require two instructions, and one fewer register.
shl rax, 16
sar rax, 16
Each of these instructions has the same latency and throughput.
It would also be necessary to mask the most-significant 16 bits prior to applying the boxing though.
In what way is the "trivial" fix an improvement? It will silently convert a kernel mode address to a canonical user mode address, which doesn't seem preferable to a guaranteed exception.
Statement in question:
A minor issue, but this misconception and the way pointers are currently implemented MAY:
Due to the success and popularity of the book, I think it would help to provide a bit more clarity so that this misconception does not become widespread.
The problem is caused by canonical addressing, which is where the 16 most significant bits of a pointer should be a copy of the 47th bit. That is, pointers should be
< 0x0000800000000000
or>= 0xffff800000000000
. On amd64 in particular, this is a requirement and attempting to dereference a non-canonical pointer will cause a runtime exception. The pointer0x0000800000000000
is non-canonical, but it is accepted as valid by the current implementation described in the book.The reason pointers would always be
<= 0x00007fffffffffff
in testing is owed to the design of the operating system and not the hardware. Mainstream kernels reserve the higher half for their own memory, and allow lower half to be used freely for user-mode applications.This is illustrated and described in more detail on the x86-64 page on wikipedia.
A trivial fix for the book, which should not change the semantics of what is implemented, would be to also mask bit 47 in the pointer in addition to the sign and qNaN bits if the platform is AMD64. No other changes would be necessary. It would strictly limit the virtual address space to half but this isn't an issue on a mainstream OS if the VM does not access any kernel memory.
The more comprehensive fix, which would enable full address space utilization, would be to perform a shift-left by 16 bits, followed by an arithmetic shift right by 16 bits. If bit 47 bit is zero, this will leave bits 48-63 as zero, and if bit 48 is 1, this will leave bits 48-63 as 1.
A C compiler should emit a shift-arithmetic-right instruction for signed integers and a shift-logical-right for unsigned integers, so we need to cast the Value to a signed integer first. (Before or after shifting left does not matter because SAL and SHL are the same operation).
Fortunately, this should have similar cost to the former masking strategy. The masking strategy should require two instructions (because
and
cannot take animm64
operand, butmov
can).The shifting strategy should only require two instructions, and one fewer register.
Each of these instructions has the same latency and throughput.
It would also be necessary to mask the most-significant 16 bits prior to applying the boxing though.
You could implement this in a way which would require no additional instructions.
Kind Regards, Mark