Proper 16-bit x86 support

natanalt commented 3 years ago

At the moment, the Zig toolchain is capable of emitting "16-bit" code through the code16 ABI. This ABI causes the output Assembly to be prefixed with a .code16 directive, telling the assembler to output 16-bit machine code. The rest of the output is just standard 32-bit code, utilizing 32-bit registers, instructions, etc. This means that code generated this way can't be executed on actual 16-bit processors and the output is also filled with a lot of size prefixes, increasing total size of the binary.

This proposal is about adding proper support for 16-bit x86 processors, like the 8086 or 286. Code generated this way would be restricted to a smaller instruction set, and obviously to 16-bit registers only. Another thing common to 16-bit code is segmentation, which is... painful (and also possible in 32-bit code, although it's barely used by anyone), although required if one wants to write any complex code running on those platforms.

Segmentation support would involve the possibility of creating far pointers. Those are pointers which apart from the 16-bit offset also include a 16-bit segment selector. Marking pointers as far could be done with a special attribute, like *far u16. Far pointers would therefore be 4 bytes long, while near/normal pointers would be 2 bytes long.

Another segmentation related feature are far functions. Those are functions called using a far call rather than a typical near call instruction. This causes the processor to push the current code segment selector alongside the instruction pointer. Returning from a far function involves using a different ret opcode as well, to actually pop that previously pushed code selector. Therefore marking functions as far could involve giving them a special calling convention.

Segmentation involves a bunch of other pitfalls, such as behavior of local variable pointers. Since those are placed on the stack, pointers to those would likely need to be far by default (unless the same segment for both stack and data is assumed). It's all dependent on the used memory model, really.

As it can be seen, implementing 16-bit x86 support in Zig would require a ton of effort and in the end would fill a really tiny niche, which is getting partially filled by the Open Watcom and the unofficial ia16-elf GNU toolchain maintained by tkchia compilers (which could work as some point of reference). I surely missed tens of possible problematic cases. I'm not an expert on Zig and even less so when it comes to its compiler (or any compiler for that matter).

If implemented, Zig could become a major platform for IA16 developers. There are many use cases such as targeting Windows 3.x, MS-DOS, OS/2, writing 16-bit kernels, 8086/286 compatible bootloaders, custom BIOSes... the possibilities are endless.

jayschwa commented 3 years ago

I have spent quite a bit of time converting 16-bit DOS code to 32-bit, and I would not wish segmented memory programming on my enemies. While this proposal is very badass, I think introducing language semantics for near versus far pointers will not have a favorable benefit-cost ratio. On the other hand, there are other accepted proposals for additional pointer metadata, so you never know!

Without language changes, an IA-16 backend might still be achievable depending on how modular stage 2 is.

Related proposal for another retro backend: https://github.com/ziglang/zig/issues/6502
Plug for my own Zig retro research project: https://github.com/jayschwa/dos.zig

prozacgod commented 2 months ago

I'm not a compiler architect by any means, but I dabble in old DOS/retro 8/16 bit machine code, and ... perhaps we don't need to rethink the entire pointer syntax, I've often wondered if modern languages could "just adopt" (a simple words, for a complex topic) a pointer model that is 16 byte aligned. If all pointers were just 0xSSSSO then all 16 bit memory addresses are addressable (Minus that oddball ~64k that is technically above the 1mb barrier, because of segment:offs math)

From my limited understanding of a language library/implementation point of view biggest issue is moving data around or allocating chunks of memory > 64k, in the segmented model it's harder to treat memory as fully linear space in that case...

Some ramblings...

For the std library tooling, if you allocated A and then 1000's of other objects later allocated B and A is 0x11600 and B is at 0x31000 they are too far apart to be in the same segment, so you'd have to use segment-to-segment copy routines - but if B was in 0x17040 they're only 65276 bytes apart and can use in-segment copy routines. But this check in a standard might be somewhat taxing if ALL copies were handled in this way. (but part of me suspect that turboc/pascal and languages of the era all did this anyway?)

If I'm off the mark here please feel free to tell me, I'm sure there's a lot of other concerns to, but this topic does interest me.

ziglang / zig

Proper 16-bit x86 support #7469