zilch-lang / nstar

The compiler for N⋆, a statically typed assembly language used as a compiler backend for Zilch
BSD 3-Clause "New" or "Revised" License
30 stars 2 forks source link

Enlarge the instruction set #62

Open Mesabloo opened 2 years ago

Mesabloo commented 2 years ago

Currently, there are too few instructions included in N*.

Here's an exhaustive list:

However, there are quite a lot more useful instructions to have in order to make N⋆ usable as both a programming assembly language and a compiler backend for Zilch. Here are some examples:

Of course this list is absolutely not exhaustive (although it should be sufficient in the beginning). Many more instructions could be added (even for specific things such as SIMD), as long as they can be typed properly inside N⋆. All the above instructions will need to have types, but this can be done while adding them to the language/compiler.

Roadmap

Mesabloo commented 2 years ago

Note on boolean instructions: there are no direct equivalent in modern machine code (x86 does not have such concept of "boolean"). notb %r0 will therefore be compiled to code similar to (on x86)

CMP $0, %rax     # Compare %r0 with 0, set ZF=1 if equal
SETE %al         # set %r0[56:8] to ZF
MOVZX %al, %rax  # Copy %ro[56:8] to %r0, filling %ro[0:56] with zeros

The last two operations could be replaced with

CMOVZ $1, %rax  # set %r0 to 1 if ZF=1
CMOVNZ $0, %rax  # set %r0 to 0 if ZF=0

although performance could be different (because CMOVcc needs to look at the flags every time, whereas only SETE looks at them above).

As there is no notion of "compare flag (CF)" and "zero flag (ZF), the cmp operation cannot exist in N⋆. This explains the need for boolean instructions.


On ARM, we may have these opcodes

cmp r0, #0
ite eq        /* if-then-else  */
moveq r0, #1  /* r0 = 1 iff r0 = 0 */
movne r0, #0  /* r0 = 0 iff r0 = 1 */
uxtb r0       /* 0-extend %r0 */

On ARM64 (AArch64) we have

cmp w0, #0
cset w0, eq
and w0, w0, K  /* where K depends on the size of w0, as long as it is 1-filled (e.g. 0xFFFF) */

The idea is that the and is used to 0-fill the register.


On MIPS, we may have

sltu $0, $0, 1
andi $0, $0, K  # same as above for ARM64 
Mesabloo commented 2 years ago

Conditional moves are also a very good part of the instruction set. They avoid most of branches. See the example above for notb, which uses conditional moves. If they were not available, one would have to define notb as the following:

  cmp %rax, $0
  jz set1
set0:
  mv $0, %rax
  j end_not
set1:
  mv $1, %rax
end_not:
  # ...

This is very heavy and makes quite a lot of jumps… And it's also not quite readable compared to the earlier version.

In N though, this control flow is a lot* heavier:

  # ...
  jz %r0, set1, set0

set0: ...
  mv %r0, 1;
  jmp end_not<...>
set1: ...
  mv %r0, 0;
  jmp end_not<...>

end_not: ...
  # ...  

If we take a look at the omitted type, one will see that the types for both set0 and set1 are exactly the same (they require the continuation in the same space, the same registers, and will return the same registers. end_not on the other end is the same type as for set0 with the added constraint that it must take %r0: uN (which we may not propagate in both set0 and set1). So there is quite a lot of redundancy for that few instructions, which conditional moves actually solve (by integrating them within the type inference rules).


As we have every kind of branching instructions (see above), we will also need all the conditional moves for the same “branchings”.