open-goal / jak-project

Reviving the language that brought us the Jak & Daxter Series
https://opengoal.dev
ISC License
2.82k stars 173 forks source link

[Compiler] Add more vector instructions #201

Closed water111 closed 3 years ago

water111 commented 3 years ago

The PS2 has 128-bit "vector float" registers and instructions for doing math on 4x 32-bit floats. The original GOAL compiler could use the "vector float" features only with inline assembly.

Modern Intel CPUs have "AVX", a set of instructions/registers that are somewhat similar to the PS2's vector floating point instructions. Both use 128-bit registers with 4 floating point values. The plan is that GOAL vector float code will become AVX in OpenGOAL

I've already started this, as you can see here - this function adds two vectors and uses inline assembly for AVX instructions: https://github.com/water111/jak-project/blob/master/goal_src/engine/math/vector-h.gc#L463 This compiles to

[vector+!]
- [0x10000] vmovaps xmm1, [r15+rsi*1]              mov ivf-6, [igpr-1 + 0]
  [0x10006] vmovaps xmm2, [r15+rdx*1]              mov ivf-7, [igpr-2 + 0]
  [0x1000c] vxorps xmm0, xmm0, xmm0                .xor.vf ivf-4, ivf-4, ivf-4
  [0x10010] vaddps xmm1, xmm1, xmm2                .add.vf ivf-5, ivf-6, ivf-7
  [0x10014] vblendps xmm1, xmm1, xmm0, 0x08        .blend.vf ivf-5, ivf-5, ivf-4, 8
  [0x1001a] vmovaps [r15+rdi*1], xmm1              move [igpr-0 + 0], ivf-5
  [0x10020] mov rax, rdi                           ret igpr-3 igpr-0
  [0x10023] ret

and the MIPS version is:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; .function vector+!
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    vmove.w vf6, vf0
    lqc2 vf4, 0(a1)
    lqc2 vf5, 0(a2)
    vadd.xyz vf6, vf4, vf5
    sqc2 vf6, 0(a0)
    or v0, a0, r0 
    jr ra
    daddu sp, sp, r0

Unfortunately, we only support a few AVX instructions at the moment, and we'll probably need more.

Find out which instructions we need

Run the decompiler and look at stuff like vector-h and vector and matrix... See what vector instructions are used and consult the EE manuals to figure out what they do and how we might implement them with AVX instructions.

Instruction Generation

The first step is to add support for the instructions. The Intel manual here https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf explains how instructions are encoded.

The OpenGOAL compiler stores instructions in an Instruction, which has useful methods for common encodings, and stores all the metadata the compiler needs. https://github.com/water111/jak-project/blob/master/goalc/emitter/Instruction.h

It may be useful to look at an example of an existing instruction, like this, which is vaddps. This function takes a few Registers, and builds an Instruction for vaddps reg, reg, reg. https://github.com/water111/jak-project/blob/master/goalc/emitter/IGen.h#L2208 Generally, for AVX, we'll be using the VEX encoded version with xmm128's. We use the three register operand form because it's the closest to the PS2's instruction set.

Instruction Test

To test that the instructions are correct, add a test like this: https://github.com/water111/jak-project/blob/master/test/test_emitter_avx.cpp#L172

Usually I like to try each register as <8 and >8 because the encoding is sometimes different in these cases. (x86 instruction encoding is complicated...)

Add support in the compiler IR

The compilation process is something like this:

For the new instruction's we'll need IR. For example, look at https://github.com/water111/jak-project/blob/master/goalc/compiler/IR.h#L497, an IR which represents a 3-register AVX instruction. This can likely be expanded for other instructions that are similar. (vmulps for example would fit well here). You can see the implementation of this IR here: https://github.com/water111/jak-project/blob/master/goalc/compiler/IR.cpp#L1277

Add support in the compiler front end

Write a function like this for the new operation: https://github.com/water111/jak-project/blob/master/goalc/compiler/compilation/Asm.cpp#L322

Then hook it up to a keyword here: https://github.com/water111/jak-project/blob/master/goalc/compiler/compilation/Atoms.cpp#L30

Add a test of the whole thing!

If you can manually disassemble a GOAL function that happens in the game, do that. Or write your own function to test it. Write a test in GOAL like this: https://github.com/water111/jak-project/blob/master/test/goalc/source_templates/with_game/test-basic-vector-math.gc and add a test case like this: https://github.com/water111/jak-project/blob/master/test/goalc/test_with_game.cpp#L352

xTVaser commented 3 years ago

Looking into this.

For the Add support in the compiler IR stage, sounds like that might be something that should wait until IR2 has officially replaced the current IR? In any case, not a blocker sounds like a later step.

water111 commented 3 years ago

Encoding Instructions Instructions: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf See section 3.1 "Interpreting the instruction reference pages"

We typically want the VEX encoding. These have three operands which is closer to the PS2. We want the 128-bit version, not the 256-bit version.

An example of what you might find in the manual: VEX.128.0F.WIG 59 /r VMULPS xmm1,xmm2, xmm3/m128 Op/En = B

The VEX means that the instruction should have VEX prefix. The Instruction class can take care of this for you:

Then you read the operand encoding for B and see:

There is a function to do this automatically and it will use the shortest encoding for VEX automatically.

  void set_vex_modrm_and_rex(uint8_t reg,
                             uint8_t rm,
                             VEX3::LeadingBytes lb,
                             uint8_t vex_reg = 0,
                             bool rex_w = false,
                             VexPrefix prefix = VexPrefix::P_NONE)

reg is ModRM:reg, the first argument. rm is ModRM:rm, the second argument lb is the leading bytes (P_0F) vex_reg is the second register The others can be left at default because the manual did not ask for them.