Closed Francessco121 closed 6 years ago
Thank you for pointing this out.
The second option is the most logical in my opinion (anyway I don't think that there's enough spare opcodes). The behavior of the jmp instructions would be easy to change (if operand == IMMEDIATE16
, then do a relative jmp instead of a absolute jmp).
That would mostly solve the problem, next step would be to change all the values of the labels to relative values at assembly time - we just need to create a new Operand type, flag operands parsed with parseLabel() as LABEL
and handle the calculation of the offset in the encode() method for this type of operand.
Problem is that in this line:
MY_CONSTANT equ 0x0021
MY_CONSTANT
is treated as a label, and we don't want the assembler to save its value as an offset.
So we need to treat those differently, instead of a HashMap<String, Character>
to save the labels, they could be saved in a HashMap<String, Label>
instead, then the parseLabel() method could flag the operand as either a REL_LABEL
or a EQU_SYMBOL
based on the Label's properties (label.isEquSymbol()
would do the trick)
If you decide to work on this don't hesitate to ask questions, I'll try to find the time to answer. If not it'll be added on my to-do list and I'll try to get it done sometime.
Thanks again!
Thanks for the quick reply!
My only concern about that solution is that it would break any existing user-code jumping to absolute positions via constants. Although that's probably not very common.
Regardless, if that's the solution you would like to roll with, I'd be happy to start working on it as soon as I can!
So, I've rethought this issue quite a bit recently and I'm starting to question whether this feature is even something the game should support.
Originally, my goal with this issue was to allow players to write position independent code, however I've realized that my understanding of that term was wrong. My (new) understanding of "relative" jumps and calls in x86 is that these are simply extra features to make position independent code easier to write, but does not actually solve the issue entirely. In x86, relative jumps can only be used in a 'near' mode meaning you can't jump across code segments, so real binaries can't only use relative jumps (this is a good reference for jmp
). x64 makes this a little bit easier with something called RIP-relative jumps (although I don't really understand how this works, finding information on that topic is difficult). What I'm getting at here is, that since writing position-independent code is still not something supported out of the box with modern hardware, it (to me) makes even less sense to bake in features to this game to allow for writing such things. In addition, since relative jumps in this game would have no code-size or performance advantages over absolute jumps, I don't see why it should implement both.
With the current instruction set provided by the game, two very well-known techniques for writing position-independent code can actually be used. I came across two wonderful blog posts explaining them if you're curious: "Load-time relocation of shared libraries" and "Position Independent Code (PIC) in shared libraries". These techniques are meant for shared libraries (executables can assume where they are loaded into virtual memory, so they do not need this), but since this game doesn't have virtual memory, players could just treat "executables" and "shared libraries" the same way with these techniques.
With that said, I also ran into a few issues implementing this. Since non-constant labels are now given the value of the offset from the beginning of the binary, rather than an absolute position, any code referencing labels that isn't a jump or a call no longer functions correctly. Take for instance:
.data
global: DW 0
.text
test [global], 1
brk
The implementation of test
would get an immediate to reference memory, however with the new code, this ends up looking at the wrong address. The current way this instruction works can't tell the difference between that, and something like test [0x1000], 1
. Now, this could be fixed of course to ensure that only jumps and calls get the relative offsets at assemble time, but I'm not sure it's worth complicating anything further.
As a side note, I must thank you for building this game. It's led me to learn sooooo much over the last few months!
What are your thoughts on this, do you think this is still something that should be added?
I think it would be worth adding it for the sake of consistency with the 8086 processor
In x86, relative jumps can only be used in a 'near' mode meaning you can't jump across code segments
Since we only have a single large segment this is not really a problem.
This could be fixed of course to ensure that only jumps and calls get the relative offsets at assemble time, but I'm not sure it's worth complicating anything further.
What I had in mind is to only change the Instruction::encode() method in the jmp/call instructions by overloading it, leaving the other instructions completely unaffected.
As a side note, I must thank you for building this game. It's led me to learn sooooo much over the last few months!
Thank you for your interest in the project :slightly_smiling_face:
I think it would be worth adding it for the sake of consistency with the 8086 processor
I'm not sure consistency makes sense here though, looking at this table it appears that you can still use immediates as operands to perform an absolute jump, but only with the addition of the PTR
keyword, which is not something currently supported by the game's assembler. Without something like that, I don't think it would make sense to remove the ability to jump/call immediates and constants in an absolute way, that would force players to store constants in a register first and then jump.
Another thing to note if this is still a feature that should be implemented, is that conditional jumps handle this differently. As far as I can tell, none of these actually support absolute jumps, so I'm not sure how close this game should get to the real thing (unless 8086 doesn't do this...but I can't find much information on that).
What I had in mind is to only change the Instruction::encode() method in the jmp/call instructions by overloading it, leaving the other instructions completely unaffected.
Ah! That would make a lot more sense! Silly overlook on my part.
Right, I didn't consider that it would in fact disallow the use of absolute immediate values JMPs. The use of the PTR directive wouldn't be too hard to implement (in fact, the game used to have 8bit and 16bit operands!) but it would mean that we would have to create more opcodes which is what we tried to avoid with our approach.
Not sure it would be the best idea to follow this path, since the game is meant to be more casual (well, as casual an assembly programming game can be)). I think the best option would be to leave it as it is.
It will be a good excuse to learn how to use relocation tables!
If you still wish to contribute, you can hop on the Slack channel and ping me and I'll try to guide you. There's a lot of interesting stuff going on with the upcoming feature (vaults!)
Sounds good to me, I'll close this then.
It will be a good excuse to learn how to use relocation tables!
I can see a few guides for implementing this with this game pop up in the future :)
If you still wish to contribute, you can hop on the Slack channel and ping me and I'll try to guide you. There's a lot of interesting stuff going on with the upcoming feature (vaults!)
I'll keep that in mind, thanks!
Currently, all jump instructions and the
call
instruction only support jumping to absolute memory locations. This makes tasks such as loading/running code stored on the floppy drive very difficult. Any code compiled and placed on the drive would need to know the exact memory location that it will be loaded into RAM, so that it can properly calculate the absolute memory locations of each label.This issue proposes adding support for relative jumps and calls to make loading code at run-time more manageable.
Current potential workarounds
The only way code (that uses jumps) could be loaded into an arbitrary location in RAM, currently, would be if the memory addresses specified in every jump statement were determined at run-time, either through a pre-processor that patches the code after loading, or potentially with some kind of JIT-compilation. Neither of these workarounds are ideal or fast.
The x86 approach to this issue
My understanding of how this is done with real x86 machine code, is that instructions such as
jmp
can be encoded as many different opcodes. Some of these representing relative jumps, some representing absolute jumps. This is an example of what I mean (the table at the top). x86jmp
instructions also encode other metadata such as whether they are near, far, or short jumps, however since the game's CPU has no concept of segments, that can be left out. Unfortunately, the current architecture that the game uses to parse compiled assembly assumes that there is a 1 to 1 relationship between opcodes and assembly instructions. Note: I couldn't really find any documentation on how 8086 assembly handles this, so it may be unrealistic to say that this project should support both absolute and relative with the same instruction.Potential solutions
1. Add relative versions of all jump instructions and the call instruction
One solution would be to simply create relative versions of instructions such as
jmp
andcall
. These could be named the same as their absolute counterparts, but prefixed or suffixed with anr
, such asrjmp
andrcall
.Advantages:
Disadvantages:
2. Add support for mnemonics representing multiple opcodes
Another solution would be to just add support for different encodings of instructions such as
jmp
. This would require a bit more work. My thought here was, similar to modern x86, have it so that specifying a literal numeric as an operand ofjmp
results in a relative jump, while using a memory location or register as an operand results in an absolute jump.Advantages:
Disadvantages:
Side-note
Apologies if my understanding of anything I assumed here is incorrect. I'm a little new to assembly programming, but I think I have a good understanding of this problem. 😅
I would also be willing to help implement anything officially decided on surrounding this.