Closed Mitch-Siegel closed 1 year ago
While I've managed to resolve the issue with %r0
and $0x1234
to stop them being interpreted as a number literals, the issue in general is going to be more difficult to solve.
If you use it like mov %r0, $1234
, then the tokenizer would see $1234
as a single token for a hex number literal, because it's a valid hex number. But then ${imm: i32}
would fail to match, since it's expecting at least two tokens, one for $
and the rest for the expression.
For this to work as intended in every case, the instruction matcher would have to re-merge the stream of tokens and break them apart at a different spot, reinterpreting the stream of characters as it works on each instruction with more context. I think I'll leave this as an exercise for the future.
Is it possible that you change $
(and even %
perhaps) to different tokens in your instruction set? It would avoid future ambiguity issues with number literals.
In trying to write a sort of macro to load a 32-bit constant on a machine where instruction size is 32 bits, I came across this issue. I had intended to make a macro that takes a destination register from a defined list of possible registers and a 32-bit immediate (
mov %{rd: reg}, ${imm: i32}
), splitting up the wider load into a 16-bit load, left shift, and 16-bit immediate add.However, this fails stating that is unable to match the
movh
instruction (which has the same format as the wider mov but 16-bit immediate)Attempting to do the same with a function definition doesn't work, giving the same error:
@hlorenzi reproduced the issue with a minimal example, believing that there is probably a bug in the use of
%
or$
next to arguments: