shingarov / Pharo-ArchC

Generate binutils from formal spec
MIT License
4 stars 1 forks source link

Incorrect SPR number on PowerPC #35

Open shingarov opened 1 month ago

shingarov commented 1 month ago

On PowerPC, for compatibility with old POWER the encoding of the SPR number in mtspr, mfspr is reversed, see Green Cloth p.385:

…the SPR number coded in assembler language does not appear directly as a 10-bit binary number in the instruction.

This explanation, combined with the pseudocode:

n = spr[5:9] || spr[0:4]

and the diagram of the XFX-form on p.384, is extremely confusing.

Be it as it may, both rs/rt and spr operand modifiers in the PDL are simply %imm and this obviously results in emitting the wrong code. The bug is especially vicious in the presence of extended mnemonics, because the special lr, ctr, xer have the reversed encoding baked in the set_asm, e.g. LR=8:

mtspr.set_asm("mtlr %imm", rs, sprf=0x100);

because 00000 0100001000 00000.

shingarov commented 1 month ago

@janvrany what happens to them on RISC-V when we encode split-fields, is it the "assembly source" or the "value" that gets into these set_asm operand constraints?

janvrany commented 1 month ago

It in most complicated cases it is both. For example, consider J-type instruction where 20bit displacement is encoded in 4 different fields and the encoded binary value is shifted one bit right (i.e, you do not encode last bit because it is always zero). The set_asm statement looks like:

xxx.set_asm("xxx %gpr, %exp(bimm16)", rd, imm1+imm2+imm3+imm4)"

Here bimm16 modifier does the shifting and imm1+imm2+imm3+imm4 does the splitting

IIUC the issue in POWER can be solved by either:

That being said, there's so many issues with POWER PDL that I'm disinclined to put much effort into "fixing" it. We should (re)start thinking of sourcing POWER model from authoritative sources, like we do for RISC-V.

shingarov commented 1 month ago

issue in POWER can be solved by

Well, in practical terms (i.e. using these SPR numbers) it's not a real issue, because the ISA definition guarantees that mfspr and mtspr have extended mnemonics for every defined SPR. Now, SPR numbers are processor- / profile-specific, so realistically we are bound by #34.

What does feel scary to me, is that n = spr[5:9] || spr[0:4] line in the Green Cloth. It suggests to me that we no longer have a clear concept of "instruction". Say, what is mflr? Is it mfspr 8, which is the undoubtedly the canonical assembly for it (and consistent with p.387); or is it mfspr 0x100, which is "mfspr with spr field at 0x100" (consistent with the pseudocode)? It looks like the definitions of assembly is still left completely informal.

We should (re)start thinking of sourcing POWER model from authoritative sources, like we do for RISC-V.

That's the thing. It appears there is no "authoritative source" for a lot of things. Even if we were to parse the BookMaster source of the instruction encoding tables and be extremely successful at it — so we would have "the authoritative instruction encoding" — we are still left with prose explanations of assembly, such as that "Compiler and Assembler Note" in the right margin of p.385.

The Sail definition of RISC-V seems no better, it's a function from a binary program to a behavior, really with no clear definiton of "instruction" as in something like instr = add | lw | sw | ….

janvrany commented 1 month ago

It suggests to me that we no longer have a clear concept of "instruction"

It was never "clear", there are multiple ways how to model instructions in ArchC language and always been. What is defined is assembly at one end and encoding at the other - everything in between can be done in multiple ways. Note also, that as code stands now it assumes there are some other concepts, like opcode field but really, there's no such thing.

The Sail definition of RISC-V seems no better

True, but in case of RISC-V those opcodes files maintained by RISC-V foundation and "more authoritative" that if we just write ArchC PDL by hand. So by "authortitative" sources I meant really "more authoritative" than what we have now. Parsing BookMaster sources would give us that "more". All that w.r.t. encoding / decoding. Poor quality of POWER PDL is one thing that holds Tinyrossa's POWER backend back.

shingarov commented 1 month ago

It was never "clear", there are multiple ways how to model instructions in ArchC language

That's not what I am talking about. The problem is not ArchC. The problem is the Green Cloth (and whatever other "authoritative sources" we have for any number of ISAs):

Parsing BookMaster sources would give us that "more"

Only in the limited example of instruction-encodings, but not for other aspects such as assembly. Case in point: that right margin of p.385, written in the kind of "prophetic English" which I was talking about in the Cambridge-2014 talk ("…don't step on the rake…boom!"). I would go as far as saying that the ArchC language is actually better than the Green Cloth even though it's wrong, because the Green Cloth is informal no matter how you parse it.

shingarov commented 1 month ago

Now, I am not contesting that the current quality of the POWER ArchC PDL is unusable for what we are trying to do. What I am saying is that parsing Green Cloth will not help, no matter how sophisticated parsing technology.