Closed vxgmichel closed 3 years ago
It's possible to remove the ,
terminator by deleting the instruction
subruledef:
#subruledef opcode {
A => 0x1
B => 0x2
C => 0x3
}
#subruledef condition {
X => 0xa
Y => 0xb
Z => 0xc
}
#ruledef {
{opc: opcode} {val: u8} => opc @ 0xa @ val
{opc: opcode}-{cnd: condition} {val: u8} => opc @ cnd @ val
}
A 51 ; OK: 1a 33
B 51 ; OK: 2a 33
C 51 ; OK: 3a 33
A-X 51 ; OK: 1a 33
A-Y 51 ; OK: 1b 33
A-Z 51 ; OK: 1c 33
B-X 51 ; OK: 2a 33
B-Y 51 ; OK: 2b 33
B-Z 51 ; OK: 2c 33
C-X 51 ; OK: 3a 33
C-Y 51 ; OK: 3b 33
C-Z 51 ; OK: 3c 33
However, I don't think there is any way to remove the -
separator.
@p-rivero Good catch! Unfortunately it doesn't help much in my case as applying your suggestion would still double the amount rules needed. That's because the operand format also splits into different patterns that end up tweaking some bits on the left side of the instruction. For more information:
I just ran into a simpler example while writing the ruleset for the full THUMB instruction set.
Consider the PUSH
instruction, defined as:
#ruledef big_endian
{
PUSH {rlist: u8} => 0b1101010 @ 0b0 @ rlist
PUSH {rlist: u8}, LR => 0b1101010 @ 0b1 @ rlist
}
PUSH 0b00001111 ; OK
PUSH 0b00001111, LR ; OK
It works just fine. Now let's convert those instructions to little endian using the following trick:
#subruledef big_endian
{
PUSH {rlist: u8} => 0b1101010 @ 0b0 @ rlist
PUSH {rlist: u8}, LR => 0b1101010 @ 0b1 @ rlist
}
#ruledef little_endian
{
{val: big_endian} => val[7:0] @ val[15:8]
}
PUSH 0b00001111 ; OK
PUSH 0b00001111, LR ; error: `ambiguous nested ruleset` and `no match for instruction found`
It now fails with both the "ambiguous" and "no match" errors. Note that this trick usually works for the other instructions, it only fails in conjunction with this pattern of one instruction being an extended version of another one. I had to fix it by adding a discriminating character in front of the failing instructions:
#subruledef big_endian
{
PUSH {rlist: u8} => 0b1101010 @ 0b0 @ rlist
!PUSH {rlist: u8}, LR => 0b1101010 @ 0b1 @ rlist
}
#ruledef little_endian
{
{val: big_endian} => val[7:0] @ val[15:8]
}
PUSH 0b00001111 ; OK
!PUSH 0b00001111, LR ; OK
Because of the little endian use-case, this is also related to issue #24.
Alright, my latest commit should fix all of the above issues! 🎉 All of the attempts described above should work (both the ABC/XYZ and the PUSH attempts). The parser should now see rules and subrules as equal in status, so it shouldn't matter how you split up, nest, or organize your instructions.
Additionally, I've made an empty pattern available for subrules! The syntax looks like this:
#subruledef condition
{
X => 0xa
Y => 0xb
Z => 0xc
{} => 0xa ; Default to X
}
I'll be releasing this fix as a new version soon.
Use case
Here's a simplified version of the problem I ran into while attempting to define some ARM instructions in customasm. Let's say I have 3 opcodes
A
,B
andC
that I can tweak using a modifier corresponding to a condition, sayX
,Y
andZ
. Those two elements form an instruction when concatenated, e.gAY
orCZ
. When the condition is omitted,X
is assumed (meaningA
,B
andC
are valid instructions). Then a full instruction is crafted by combining the instruction defined above with an 8 bit integer (e.gAY 51
).Here's a working implementation in customasm:
The problem here is that all instructions have been crafted manually, which can be tedious and error prone. Also, it can quickly grow much bigger with more realistic instruction sets.
Attempt 1 - Use an empty pattern
In order to reduce the amount of redundancy, I first tried to factorize the logic using the
opcode
andcondition
subrules. I naively used an empty pattern to bind a missing condition toX
, although it is not supported:This doesn't work but I think it gives a good idea of what a straight forward definition for this use case could look like.
Attempt 2 - Distinct instruction subrules
In an attempt to work around the lack of empty pattern support, I simply split the instructions into two distinct rules:
This would still be a nice way of factorizing the logic, except it doesn't work either (at least for the conditional instructions).
Attempt 3 - Distinct instruction subrules with
-
separatorsMy third attempt has been to introduce extra separators as I noticed empirically that then can make a difference as how the instructions are parsed. In particular, I simply added a
-
character between the opcode and the condition:Unfortunately it doesn't work any better than the previous attempt but it produces a different error, about the ruleset being ambiguous.
Attempt 4 - Distinct instruction subrules with
-
separators and,
terminatorsIn order to fix this ambiguity, I then add extra terminators to help the parser differentiate the two rules:
This version actually succeeds at producing the correct binary, although it's a bit verbose.
Question
My overall question is then: would it be possible to bridge the gap between the naive but broken first attempt and the working but verbose fourth attempt?
There might also be other solutions that I overlooked :)