[Compiler] Add more vector instructions

The PS2 has 128-bit "vector float" registers and instructions for doing math on 4x 32-bit floats. The original GOAL compiler could use the "vector float" features only with inline assembly.

Modern Intel CPUs have "AVX", a set of instructions/registers that are somewhat similar to the PS2's vector floating point instructions. Both use 128-bit registers with 4 floating point values. The plan is that GOAL vector float code will become AVX in OpenGOAL

I've already started this, as you can see here - this function adds two vectors and uses inline assembly for AVX instructions: https://github.com/water111/jak-project/blob/master/goal_src/engine/math/vector-h.gc#L463 This compiles to

[vector+!]
- [0x10000] vmovaps xmm1, [r15+rsi*1]              mov ivf-6, [igpr-1 + 0]
  [0x10006] vmovaps xmm2, [r15+rdx*1]              mov ivf-7, [igpr-2 + 0]
  [0x1000c] vxorps xmm0, xmm0, xmm0                .xor.vf ivf-4, ivf-4, ivf-4
  [0x10010] vaddps xmm1, xmm1, xmm2                .add.vf ivf-5, ivf-6, ivf-7
  [0x10014] vblendps xmm1, xmm1, xmm0, 0x08        .blend.vf ivf-5, ivf-5, ivf-4, 8
  [0x1001a] vmovaps [r15+rdi*1], xmm1              move [igpr-0 + 0], ivf-5
  [0x10020] mov rax, rdi                           ret igpr-3 igpr-0
  [0x10023] ret

and the MIPS version is:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; .function vector+!
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    vmove.w vf6, vf0
    lqc2 vf4, 0(a1)
    lqc2 vf5, 0(a2)
    vadd.xyz vf6, vf4, vf5
    sqc2 vf6, 0(a0)
    or v0, a0, r0 
    jr ra
    daddu sp, sp, r0

Unfortunately, we only support a few AVX instructions at the moment, and we'll probably need more.

Find out which instructions we need

Run the decompiler and look at stuff like vector-h and vector and matrix... See what vector instructions are used and consult the EE manuals to figure out what they do and how we might implement them with AVX instructions.

Instruction Generation

The first step is to add support for the instructions. The Intel manual here https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf explains how instructions are encoded.

The OpenGOAL compiler stores instructions in an Instruction, which has useful methods for common encodings, and stores all the metadata the compiler needs. https://github.com/water111/jak-project/blob/master/goalc/emitter/Instruction.h

It may be useful to look at an example of an existing instruction, like this, which is vaddps. This function takes a few Registers, and builds an Instruction for vaddps reg, reg, reg. https://github.com/water111/jak-project/blob/master/goalc/emitter/IGen.h#L2208 Generally, for AVX, we'll be using the VEX encoded version with xmm128's. We use the three register operand form because it's the closest to the PS2's instruction set.

Instruction Test

To test that the instructions are correct, add a test like this: https://github.com/water111/jak-project/blob/master/test/test_emitter_avx.cpp#L172

Usually I like to try each register as <8 and >8 because the encoding is sometimes different in these cases. (x86 instruction encoding is complicated...)

Add support in the compiler IR

The compilation process is something like this:

Read and parse s-expressions, stored in goos::Object (goos is the name of the macro system, which is an interpreted Scheme-like language. All GOAL code is first stored as goos objects so it can be manipulated by the goos macro system.)
Iterate through s-expressions depth first, doing macro expansions, type stuff, and generating a "Intermediate Representation" (IR) for each function. This is just a std::vector<IR> which each IR corresponds to some simple operation like "add" or "load from memory"
Do register allocation and final code generation, converting the std::vector<IR> from IR to Instructions, then finally to executable code.

For the new instruction's we'll need IR. For example, look at https://github.com/water111/jak-project/blob/master/goalc/compiler/IR.h#L497, an IR which represents a 3-register AVX instruction. This can likely be expanded for other instructions that are similar. (vmulps for example would fit well here). You can see the implementation of this IR here: https://github.com/water111/jak-project/blob/master/goalc/compiler/IR.cpp#L1277

Add support in the compiler front end

Write a function like this for the new operation: https://github.com/water111/jak-project/blob/master/goalc/compiler/compilation/Asm.cpp#L322

Then hook it up to a keyword here: https://github.com/water111/jak-project/blob/master/goalc/compiler/compilation/Atoms.cpp#L30

Add a test of the whole thing!

If you can manually disassemble a GOAL function that happens in the game, do that. Or write your own function to test it. Write a test in GOAL like this: https://github.com/water111/jak-project/blob/master/test/goalc/source_templates/with_game/test-basic-vector-math.gc and add a test case like this: https://github.com/water111/jak-project/blob/master/test/goalc/test_with_game.cpp#L352

Encoding Instructions Instructions: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf See section 3.1 "Interpreting the instruction reference pages"

We typically want the VEX encoding. These have three operands which is closer to the PS2. We want the 128-bit version, not the 256-bit version.

An example of what you might find in the manual: VEX.128.0F.WIG 59 /r VMULPS xmm1,xmm2, xmm3/m128 Op/En = B

The VEX means that the instruction should have VEX prefix. The Instruction class can take care of this for you:

The 128 bit length should be selected. (default value is 0 = 128-bit, you don't have to worry about it)
0F - means you should set the "implied 0f leading opcode byte". This is VEX3::LeadingBytes::P_0F.
WIG - the W field is ignored.
59 this is the opcode.
/r it uses the MODRM byte to indicate a register and a register/memory (r/m) operand. The register/memory operand should be used as a register because this is like the MIPS forms where all arguments are registers.
VMULPS the opcode name
xmm1, xmm2, xmm3/m128 : the three operands. Two are registers, and the third is either a register or 128-bits of memory.

Then you read the operand encoding for B and see:

op1: ModRM:reg - xmm1 goes in the reg field of modrm
op2: VEX.vvvv xmm2 goes in the vvvv field of VEX
op3: ModRM:r/m xmm3 goes in the r/m field of ModRM

There is a function to do this automatically and it will use the shortest encoding for VEX automatically.

  void set_vex_modrm_and_rex(uint8_t reg,
                             uint8_t rm,
                             VEX3::LeadingBytes lb,
                             uint8_t vex_reg = 0,
                             bool rex_w = false,
                             VexPrefix prefix = VexPrefix::P_NONE)

reg is ModRM:reg, the first argument. rm is ModRM:rm, the second argument lb is the leading bytes (P_0F) vex_reg is the second register The others can be left at default because the manual did not ask for them.

open-goal / jak-project