Closed rj45 closed 4 years ago
I can't say for sure without more source code, but it seems you've defined that dump
label after the jal
instruction that tries to use it.
In a case like this, the assembler, as a fail-safe, will always select the last matching instruction you've defined because it doesn't yet have a definite address for dump
(and currently doesn't try to estimate it or do any further analysis). It's probably a very hard problem to solve in general, but it might be possible to do some kind of analysis in the simpler cases.
As of now, I'd recommend defining new unambiguous instructions for the long and short cases (something like jal.l {long}
and jal.s {short}
) and using them manually.
You're right, dump is defined later in the program, and the reason you give makes total sense.
The issue then is many instruction sets have different encodings depending on how far away an address is. So for example, if it fits in 5 bits it might pick a short encoding, if it fits in 8 bits another encoding and if it fits in 16 bits maybe it generates a couple instructions instead.
You might even have to convert a conditional branch into a bunch of instructions if you don't have enough range. So for example, you might need to convert a long branch backward into a short branch forward over an unconditional jump. All of these things will change the address of labels after those adjustments.
I think how other assemblers like VASM handle this is doing multiple passes. Maybe the first pass unknown addresses are assumed to be zero, then in next passes it continues to refine the address of labels until they stop changing. I think VASM will do this up to 1000 times by default.
i ran into similar problems... in a RISC CPU i made i have 2 CALL Instructions, one uses a signed 8 bit relative offset and is 2 bytes long, and the other uses a 16 bit absolute address and is 4 bytes long. so i wanted to use assertions to automatically select the relative CALL Instruction if the target is within range of a signed 8 bit integer, because it makes the program more compact.
CALL {VAL} -> {REL = (VAL - pc), assert(REL <= 127 && REL >= -128), REL[7:0] @ 0b0100 @ 14[3:0]} Relative CALL, 2 bytes
CALL {ADDR} -> {REL = (ADDR - pc), assert(REL > 127 || REL < -128), 0b000000000101 @ 14[3:0] @ ADDR[15:0]} Absolute CALL, 4 bytes
so it's a shame that it doesn't work, as it's basically just automatic optimization that the programmer doesn't have to actively think about.
actually nevermind... because of the recent overhaul this is now possible. using the same exact code as above (though with the -> changed to =>) it works perfectly.
Yes, with the overhaul in v0.11, I've made the assembler work in multiple passes, starting with guesses for undefined labels, and iterating until everything converges, much like the way it's described in the comment above. The assembler should be able to solve your original problem now!
The following code should work in all cases:
It's checking to see if the current program counter address is in the same page (upper 8-bits) as the value to jump to. If it is, it can save a byte (and a cycle) and select the in-page jump. If not, it needs to select a long jump and emit an extra byte.
This code only seems to ever select the second instruction, it won't select the first instruction. If I reverse the order, again it only seems to select the second instruction, if the assertion fails, it produces this:
The assertion should succeed on the other instruction but it doesn't want to select that for some reason. I haven't dived into the code to find out why.