Relative Jump/Call Instructions

Francessco121 commented 6 years ago

Currently, all jump instructions and the call instruction only support jumping to absolute memory locations. This makes tasks such as loading/running code stored on the floppy drive very difficult. Any code compiled and placed on the drive would need to know the exact memory location that it will be loaded into RAM, so that it can properly calculate the absolute memory locations of each label.

This issue proposes adding support for relative jumps and calls to make loading code at run-time more manageable.

Current potential workarounds

The only way code (that uses jumps) could be loaded into an arbitrary location in RAM, currently, would be if the memory addresses specified in every jump statement were determined at run-time, either through a pre-processor that patches the code after loading, or potentially with some kind of JIT-compilation. Neither of these workarounds are ideal or fast.

The x86 approach to this issue

My understanding of how this is done with real x86 machine code, is that instructions such as jmp can be encoded as many different opcodes. Some of these representing relative jumps, some representing absolute jumps. This is an example of what I mean (the table at the top). x86 jmp instructions also encode other metadata such as whether they are near, far, or short jumps, however since the game's CPU has no concept of segments, that can be left out. Unfortunately, the current architecture that the game uses to parse compiled assembly assumes that there is a 1 to 1 relationship between opcodes and assembly instructions. Note: I couldn't really find any documentation on how 8086 assembly handles this, so it may be unrealistic to say that this project should support both absolute and relative with the same instruction.

Potential solutions

1. Add relative versions of all jump instructions and the call instruction

One solution would be to simply create relative versions of instructions such as jmp and call. These could be named the same as their absolute counterparts, but prefixed or suffixed with an r, such as rjmp and rcall.

Advantages:

Will not break any existing code.
Does not require any kind of re-factor.
Provides a very clear distinction of intent.
The byte-size of the instructions stay small.

Disadvantages:

Uses up a lot of opcodes.
Strays away from real 8086 assembly.

2. Add support for mnemonics representing multiple opcodes

Another solution would be to just add support for different encodings of instructions such as jmp. This would require a bit more work. My thought here was, similar to modern x86, have it so that specifying a literal numeric as an operand of jmp results in a relative jump, while using a memory location or register as an operand results in an absolute jump.

Advantages:

Allows other instructions to support multiple opcodes for the same mnemonic (may be useful in the future).
Stays closer to modern x86 assembly (I'm not sure about 8086 though).

Disadvantages:

Changes to byte-code parsing.
Changes to the assembler.
May require existing code to be re-assembled.

Side-note

Apologies if my understanding of anything I assumed here is incorrect. I'm a little new to assembly programming, but I think I have a good understanding of this problem. 😅

I would also be willing to help implement anything officially decided on surrounding this.

simon987 commented 6 years ago

Thank you for pointing this out.

The second option is the most logical in my opinion (anyway I don't think that there's enough spare opcodes). The behavior of the jmp instructions would be easy to change (if operand == IMMEDIATE16, then do a relative jmp instead of a absolute jmp).

That would mostly solve the problem, next step would be to change all the values of the labels to relative values at assembly time - we just need to create a new Operand type, flag operands parsed with parseLabel() as LABEL and handle the calculation of the offset in the encode() method for this type of operand.

Problem is that in this line:

MY_CONSTANT equ 0x0021

MY_CONSTANT is treated as a label, and we don't want the assembler to save its value as an offset.
So we need to treat those differently, instead of a HashMap<String, Character> to save the labels, they could be saved in a HashMap<String, Label> instead, then the parseLabel() method could flag the operand as either a REL_LABEL or a EQU_SYMBOL based on the Label's properties (label.isEquSymbol() would do the trick)

If you decide to work on this don't hesitate to ask questions, I'll try to find the time to answer. If not it'll be added on my to-do list and I'll try to get it done sometime.

Thanks again!

Francessco121 commented 6 years ago

Thanks for the quick reply!

My only concern about that solution is that it would break any existing user-code jumping to absolute positions via constants. Although that's probably not very common.

Regardless, if that's the solution you would like to roll with, I'd be happy to start working on it as soon as I can!

Francessco121 commented 6 years ago

So, I've rethought this issue quite a bit recently and I'm starting to question whether this feature is even something the game should support.

Originally, my goal with this issue was to allow players to write position independent code, however I've realized that my understanding of that term was wrong. My (new) understanding of "relative" jumps and calls in x86 is that these are simply extra features to make position independent code easier to write, but does not actually solve the issue entirely. In x86, relative jumps can only be used in a 'near' mode meaning you can't jump across code segments, so real binaries can't only use relative jumps (this is a good reference for jmp). x64 makes this a little bit easier with something called RIP-relative jumps (although I don't really understand how this works, finding information on that topic is difficult). What I'm getting at here is, that since writing position-independent code is still not something supported out of the box with modern hardware, it (to me) makes even less sense to bake in features to this game to allow for writing such things. In addition, since relative jumps in this game would have no code-size or performance advantages over absolute jumps, I don't see why it should implement both.

With the current instruction set provided by the game, two very well-known techniques for writing position-independent code can actually be used. I came across two wonderful blog posts explaining them if you're curious: "Load-time relocation of shared libraries" and "Position Independent Code (PIC) in shared libraries". These techniques are meant for shared libraries (executables can assume where they are loaded into virtual memory, so they do not need this), but since this game doesn't have virtual memory, players could just treat "executables" and "shared libraries" the same way with these techniques.

With that said, I also ran into a few issues implementing this. Since non-constant labels are now given the value of the offset from the beginning of the binary, rather than an absolute position, any code referencing labels that isn't a jump or a call no longer functions correctly. Take for instance:

.data
  global: DW 0

.text
  test [global], 1
  brk

The implementation of test would get an immediate to reference memory, however with the new code, this ends up looking at the wrong address. The current way this instruction works can't tell the difference between that, and something like test [0x1000], 1. Now, this could be fixed of course to ensure that only jumps and calls get the relative offsets at assemble time, but I'm not sure it's worth complicating anything further.

As a side note, I must thank you for building this game. It's led me to learn sooooo much over the last few months!

What are your thoughts on this, do you think this is still something that should be added?

simon987 commented 6 years ago

I think it would be worth adding it for the sake of consistency with the 8086 processor

In x86, relative jumps can only be used in a 'near' mode meaning you can't jump across code segments

Since we only have a single large segment this is not really a problem.

This could be fixed of course to ensure that only jumps and calls get the relative offsets at assemble time, but I'm not sure it's worth complicating anything further.

What I had in mind is to only change the Instruction::encode() method in the jmp/call instructions by overloading it, leaving the other instructions completely unaffected.

As a side note, I must thank you for building this game. It's led me to learn sooooo much over the last few months!

Thank you for your interest in the project :slightly_smiling_face:

Francessco121 commented 6 years ago

I think it would be worth adding it for the sake of consistency with the 8086 processor

I'm not sure consistency makes sense here though, looking at this table it appears that you can still use immediates as operands to perform an absolute jump, but only with the addition of the PTR keyword, which is not something currently supported by the game's assembler. Without something like that, I don't think it would make sense to remove the ability to jump/call immediates and constants in an absolute way, that would force players to store constants in a register first and then jump.

Another thing to note if this is still a feature that should be implemented, is that conditional jumps handle this differently. As far as I can tell, none of these actually support absolute jumps, so I'm not sure how close this game should get to the real thing (unless 8086 doesn't do this...but I can't find much information on that).

What I had in mind is to only change the Instruction::encode() method in the jmp/call instructions by overloading it, leaving the other instructions completely unaffected.

Ah! That would make a lot more sense! Silly overlook on my part.

simon987 commented 6 years ago

Right, I didn't consider that it would in fact disallow the use of absolute immediate values JMPs. The use of the PTR directive wouldn't be too hard to implement (in fact, the game used to have 8bit and 16bit operands!) but it would mean that we would have to create more opcodes which is what we tried to avoid with our approach.

Not sure it would be the best idea to follow this path, since the game is meant to be more casual (well, as casual an assembly programming game can be)). I think the best option would be to leave it as it is.

It will be a good excuse to learn how to use relocation tables!

If you still wish to contribute, you can hop on the Slack channel and ping me and I'll try to guide you. There's a lot of interesting stuff going on with the upcoming feature (vaults!)

Francessco121 commented 6 years ago

Sounds good to me, I'll close this then.

It will be a good excuse to learn how to use relocation tables!

I can see a few guides for implementing this with this game pop up in the future :)

If you still wish to contribute, you can hop on the Slack channel and ping me and I'll try to guide you. There's a lot of interesting stuff going on with the upcoming feature (vaults!)

I'll keep that in mind, thanks!

simon987 / Much-Assembly-Required