Closed gardell closed 2 years ago
Thanks for the detailed bug report.
I'm not sure I see the error here. Wolfram alpha is returning [538, 612, 686, 760]
, which seems to be the exact same result that the same code generates when using M000
. The results in result_transposed
differ from what wolfram returns.
Is this not the expected behaviour?
So I should clarify, the
assert_eq!((result.x, result.y, result.z, result.w), (538.0, 612.0, 686.0, 760.0));
assert_eq!((result_transposed.x, result_transposed.y, result_transposed.z, result_transposed.w), (190.0, 486.0, 782.0, 1078.0));
lines fail.
The output from the first assertion is:
panicked at `assertion failed: `(left == right)`
left: `(190.0, 486.0, 782.0, 1078.0)`,
right: `(538.0, 612.0, 686.0, 760.0)`, src/main.rs:125:5
Ah I see. Interestingly, I fed the same assembly through the original GCC toolchain (which was derived from Sony), and there was no difference in the output.
ScePspFMatrix4 matrix = {
{ 1.0, 2.0, 3.0, 4.0 },
{ 5.0, 6.0, 7.0, 8.0 },
{ 9.0, 10.0, 11.0, 12.0 },
{ 13.0, 14.0, 15.0, 16.0, },
};
ScePspFVector4 vector = { 17.0, 18.0, 19.0, 20.0 };
ScePspFVector4 result;
__asm__ volatile (
"lv.q C000, 0 + %1\n"
"lv.q C010, 16 + %1\n"
"lv.q C020, 32 + %1\n"
"lv.q C030, 48 + %1\n"
"lv.q C100, %2\n"
"vtfm4.q C110, M000, C100\n"
"sv.q C110, %0\n"
: "=m"(result) : "m"(matrix), "m"(vector) : "memory"
);
As it turns out, it seems that the transposed registers E___
are designed for column-mode operations. So the assembler generation is actually correct, you just have to use E000
because you are doing a column operation. M000
in this case is actually the transposed matrix.
It may seem counter-intuitive because you are using vtfm4.q
with M000 * C100
, but this is just a notational trick; you can just as easily do M000 * R100
if you stored vector
into R100
. The assembler doesn't care, the instruction always assumes the vector operand is a column vector, and thus the matrix operand can be transposed either way.
I wasn't aware of this myself. Should probably add more tests...
BTW: your snippet has some undefined behavior (taking a &mut
reference on uninitialized data):
let mut result: psp::sys::ScePspFVector4 = unsafe { core::mem::MaybeUninit::uninit().assume_init() };
You can instead do:
let mut result = core::mem::MaybeUninit::uninit(); // No unsafe necessary
Then you can write into the result with result = in(reg) (result.as_mut_ptr())
, and finally .assume_init()
after the assembly executes.
Woah that's so confusing! Thanks for taking the time to look into this though!
But this should mean that the documentation at http://hitmen.c02.at/files/yapspd/psp_doc/chap4.html saying that E000
is the transpose matrix is wrong? I can't see how this wouldn't apply to the general case?
Thanks for the MaybeUninit
thing, I haven't used it before.
Indeed the description is off. It would perhaps be better to say that the E___
registers represent column-major matrices, while M___
registers represent row-major matrices.
It would then make sense that choosing e.g. M000
in vtfm4.q
is actually the transpose of the data, because matrix
is accessed by E000
as a column-major matrix. (Accessing a column-major matrix in row-major order is a transposition).
Operating under this definition, sv.q
/ lv.q
therefore must work with row-major matrices, which is why you use M___
when loading/storing vectors. The shorthands C___
and R___
are just notation that you use when working with a row-major matrix. R___
is packed tightly, but C___
has a stride. The rule of thumb then should be do everything in row-major order until you need to access it differently in an instruction like vtfm4.q
.
This is why using M___
like normal for loading/storing works in your example, but we have to use E___
when actually manipulating the data, to make sure it is interpreted in column-major format. For what it's worth, the only instructions that this matters for in the assembler are vtfm_
and vhtfm_
, because nearly every single other operation isn't sensitive to the matrix format (vmidt
, vmzero
, vmmov
).
The one big exception to this rule would be matrix multiplication. If we want to multiply 2 matrices "normally" with column-major order, according to our definition above we would expect to do vmmul.q E000, E100, E200
, but instead we do:
; Column-major multiplication
;
; M000 = M100 * M200
vmmul.q M000, M100, M200
The assembler actually flips the M/E bit to make this happen. I'm not sure why Sony did this. I assume it's because logically applying E___
here would be too confusing for unfamiliar devs writing this common operation, so they just special-cased it. Unfortunately this just made things more complex. Here, the original definition of "transposed matrix" from hitmen.c02.at seems to apply.
EDIT: To make this more strange and subtle, only the M/E bit of the middle register is flipped, but the multiplication order of the two matrices is also flipped while interpreting the instruction. This combination of changes still has the same effect as transposing all registers as described above, so we can pretend that
M___
gives us column-major multiplication.
I guess it somewhat means that each instruction has its own definition of "normal" vs "transposed", but if you use it with this mental model I think it should work fine. There may be more exceptions to this rule, but after playing around and examining the opcode tables, this seems to work as expected.
Hello! First of all, great work on this awesome library! Really easy to use and get started with.
However, I've been trying to do some VFPU stuff and ran into some issues. I made a commit at https://github.com/gardell/rust-psp/commit/273a9afb581e1ae98f142043861be6e3237f0f3c that should show the problem.
I'm loading a
ScePspFMatrix4
into the columnsC000
,C010
,C020
andC030
then perform avtfm4.q
on theM100
register. But when I look at the result, it is as if the matrix was transposed. If I useE100
I get the results that I want.Looking at http://hitmen.c02.at/files/yapspd/psp_doc/chap4.html section
4.5.1 Registers
it is explained thatC000
,C010
,C020
andC030
are indeed the columns of the matrixM100
and thatE100
is the transpose ofM100
.I also set up the example matrices from the above commit in Wolfram Alpha to double check: https://www.wolframalpha.com/input?i=transpose%28%7B%7B1%2C+2%2C+3%2C+4%7D%2C+%7B5%2C+6%2C+7%2C+8%7D%2C+%7B9%2C+10%2C+11%2C+12%7D%2C+%7B13%2C+14%2C+15%2C+16%7D%7D%29+*+%7B%7B17%7D%2C+%7B18%7D%2C+%7B19%7D%2C+%7B20%7D%7D note that Wolfram Alpha uses row-major order, so a
transpose()
there is needed. To have row-major order with the VFPU I could have usedR000
,R001
,R002
,R003
when loading the matrix in my example above..Doesn't this mean that, in this library the symbols
M100
andE100
should swap places?Thanks, Johan