overdrivenpotato / rust-psp

Rust on PSP. Panic and allocation support. Access PSP system libraries.
Other
638 stars 33 forks source link

Matrices are transposed #112

Closed gardell closed 2 years ago

gardell commented 2 years ago

Hello! First of all, great work on this awesome library! Really easy to use and get started with.

However, I've been trying to do some VFPU stuff and ran into some issues. I made a commit at https://github.com/gardell/rust-psp/commit/273a9afb581e1ae98f142043861be6e3237f0f3c that should show the problem.

I'm loading a ScePspFMatrix4 into the columns C000, C010, C020 and C030 then perform a vtfm4.q on the M100 register. But when I look at the result, it is as if the matrix was transposed. If I use E100 I get the results that I want.

Looking at http://hitmen.c02.at/files/yapspd/psp_doc/chap4.html section 4.5.1 Registers it is explained that C000, C010, C020 and C030 are indeed the columns of the matrix M100 and that E100 is the transpose of M100.

I also set up the example matrices from the above commit in Wolfram Alpha to double check: https://www.wolframalpha.com/input?i=transpose%28%7B%7B1%2C+2%2C+3%2C+4%7D%2C+%7B5%2C+6%2C+7%2C+8%7D%2C+%7B9%2C+10%2C+11%2C+12%7D%2C+%7B13%2C+14%2C+15%2C+16%7D%7D%29+*+%7B%7B17%7D%2C+%7B18%7D%2C+%7B19%7D%2C+%7B20%7D%7D note that Wolfram Alpha uses row-major order, so a transpose() there is needed. To have row-major order with the VFPU I could have used R000, R001, R002, R003 when loading the matrix in my example above..

Doesn't this mean that, in this library the symbols M100 and E100 should swap places?

Thanks, Johan

overdrivenpotato commented 2 years ago

Thanks for the detailed bug report.

I'm not sure I see the error here. Wolfram alpha is returning [538, 612, 686, 760], which seems to be the exact same result that the same code generates when using M000. The results in result_transposed differ from what wolfram returns.

Is this not the expected behaviour?

gardell commented 2 years ago

So I should clarify, the

assert_eq!((result.x, result.y, result.z, result.w), (538.0, 612.0, 686.0, 760.0));
assert_eq!((result_transposed.x, result_transposed.y, result_transposed.z, result_transposed.w), (190.0, 486.0, 782.0, 1078.0));

lines fail.

The output from the first assertion is:

panicked at `assertion failed: `(left == right)`
  left: `(190.0, 486.0, 782.0, 1078.0)`,
right: `(538.0, 612.0, 686.0, 760.0)`, src/main.rs:125:5
overdrivenpotato commented 2 years ago

Ah I see. Interestingly, I fed the same assembly through the original GCC toolchain (which was derived from Sony), and there was no difference in the output.

    ScePspFMatrix4 matrix = {
        { 1.0, 2.0, 3.0, 4.0 },
        { 5.0, 6.0, 7.0, 8.0 },
        { 9.0, 10.0, 11.0, 12.0 },
        { 13.0, 14.0, 15.0, 16.0, },
    };

    ScePspFVector4 vector = { 17.0, 18.0, 19.0, 20.0 };

    ScePspFVector4 result;

    __asm__ volatile (
         "lv.q C000, 0  + %1\n"
         "lv.q C010, 16 + %1\n"
         "lv.q C020, 32 + %1\n"
         "lv.q C030, 48 + %1\n"
         "lv.q C100, %2\n"
         "vtfm4.q C110, M000, C100\n"
         "sv.q C110, %0\n"
         : "=m"(result) : "m"(matrix), "m"(vector) : "memory"
    );

As it turns out, it seems that the transposed registers E___ are designed for column-mode operations. So the assembler generation is actually correct, you just have to use E000 because you are doing a column operation. M000 in this case is actually the transposed matrix.

It may seem counter-intuitive because you are using vtfm4.q with M000 * C100, but this is just a notational trick; you can just as easily do M000 * R100 if you stored vector into R100. The assembler doesn't care, the instruction always assumes the vector operand is a column vector, and thus the matrix operand can be transposed either way.

I wasn't aware of this myself. Should probably add more tests...


BTW: your snippet has some undefined behavior (taking a &mut reference on uninitialized data):

let mut result: psp::sys::ScePspFVector4 = unsafe { core::mem::MaybeUninit::uninit().assume_init() };

You can instead do:

let mut result = core::mem::MaybeUninit::uninit(); // No unsafe necessary

Then you can write into the result with result = in(reg) (result.as_mut_ptr()), and finally .assume_init() after the assembly executes.

gardell commented 2 years ago

Woah that's so confusing! Thanks for taking the time to look into this though!

But this should mean that the documentation at http://hitmen.c02.at/files/yapspd/psp_doc/chap4.html saying that E000 is the transpose matrix is wrong? I can't see how this wouldn't apply to the general case?

Thanks for the MaybeUninit thing, I haven't used it before.

overdrivenpotato commented 2 years ago

Indeed the description is off. It would perhaps be better to say that the E___ registers represent column-major matrices, while M___ registers represent row-major matrices.

It would then make sense that choosing e.g. M000 in vtfm4.q is actually the transpose of the data, because matrix is accessed by E000 as a column-major matrix. (Accessing a column-major matrix in row-major order is a transposition).

Operating under this definition, sv.q / lv.q therefore must work with row-major matrices, which is why you use M___ when loading/storing vectors. The shorthands C___ and R___ are just notation that you use when working with a row-major matrix. R___ is packed tightly, but C___ has a stride. The rule of thumb then should be do everything in row-major order until you need to access it differently in an instruction like vtfm4.q.

This is why using M___ like normal for loading/storing works in your example, but we have to use E___ when actually manipulating the data, to make sure it is interpreted in column-major format. For what it's worth, the only instructions that this matters for in the assembler are vtfm_ and vhtfm_, because nearly every single other operation isn't sensitive to the matrix format (vmidt, vmzero, vmmov).

The one big exception to this rule would be matrix multiplication. If we want to multiply 2 matrices "normally" with column-major order, according to our definition above we would expect to do vmmul.q E000, E100, E200, but instead we do:

; Column-major multiplication
;
; M000 = M100 * M200
vmmul.q M000, M100, M200

The assembler actually flips the M/E bit to make this happen. I'm not sure why Sony did this. I assume it's because logically applying E___ here would be too confusing for unfamiliar devs writing this common operation, so they just special-cased it. Unfortunately this just made things more complex. Here, the original definition of "transposed matrix" from hitmen.c02.at seems to apply.

EDIT: To make this more strange and subtle, only the M/E bit of the middle register is flipped, but the multiplication order of the two matrices is also flipped while interpreting the instruction. This combination of changes still has the same effect as transposing all registers as described above, so we can pretend that M___ gives us column-major multiplication.

I guess it somewhat means that each instruction has its own definition of "normal" vs "transposed", but if you use it with this mental model I think it should work fine. There may be more exceptions to this rule, but after playing around and examining the opcode tables, this seems to work as expected.