Load addresses aren't relocated on CP/M-65

It looks like there's yet another relocation issue. This line in the generic linker scripts:

__zp_data_load_start = LOADADDR(.zp.data);

...generates a absolute symbol, not a relocatable one, which means that elftocpm65 doesn't emit a fixup and all kinds of things go horribly wrong.

Reference: https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#index-LOADADDR_0028section_0029

I wrote elftocpm65 and this is not the first time I've run into this; I'm now beginning to wonder whether I have, in fact, been doing it wrong the whole time, and whether there's a better approach.

The problem here is that the linker assumes that the binary is loaded at a fixed location, and then chunks of data moved to their final location. However, what's actually happening is the the binary is loaded at a variable location, then the addresses in it are fixed up, and _then_ the chunks of data are moved to their final location. So, LMA addresses refer to symbols after fixup but before relocation, and VMA addresses refer to symbols after fixup _and_ after relocation. The obvious but wrong solution would therefore be to emit fixups for all symbols, whether they're relocatable or absolute. Except we can't do this because absolute symbols are also used to refer to things like sizes or constants, which may look like valid addresses but aren't. Currently I'm just not producing fixups for any absolute symbols. If there were a way to know whether an absolute symbol contained an address which I need to fix up, or a number which I don't, then I could be smarter about this. But I'm not sure there is. From readelf: ``` 122: 0000165f 0 NOTYPE GLOBAL DEFAULT ABS __zp_data_load_start 123: 00000018 0 NOTYPE GLOBAL DEFAULT ABS __zp_data_size ``` The first needs fixing up and the second doesn't. So far the least bad option I can think of is to check the symbol name --- if it's absolute and ends with `_load_start`, fix it up anyway; there are only two sections which can be relocated. But that's horrifying. Surely there's a better way to do this? (I'm aware that it's possible to turn zp-lto off, and I'm going to have to send a PR which does this for now, but I'd really like a proper fix.)

To me, this seems like a consequence of the fundamental tension between the way ELF is intended to be used and the way CP/M-65 is using it.

What CP/M-65 is doing is termed "load-time relocation": the executable contains relocations that a dynamic loader (CP/M-65) performs as part of the loading of that executable into memory.

In typical ELF usage, load-time relocations uses the symbol and reloc information in the dynamic sections. The main ELF relocation and symbol sections are instead intended for relocatable objects used as part of a final link, eventually ending up as either a shared library (which is really a type of ELF executable) or an executable. Thus, it's expected that all there are no outstanding non-dynamic relocations in an executable; any present are informative, not nominative.

Typically, load-time relocation can only be done for shared libraries. However, position independent executables (-fpie and friends) can also have dynamic relocations. However, PIE currently tells the compiler to generate position-independent code; this typically necessitates a GOT, PLT, PC-relative addressing, all that jazz. However, that isn't intrinsic; you could instead imagine the compiler producing position-dependent code, then the linker turning it into an executable that was compiled to produce dynamic-relocs as if it were a load-time relocatable shared library. That would be what you'd ideally want take take as input in CP/M-65; it would have a set of relocations (the dynamic ones) specifically called out by the linker as being the right ones to fix up at load time.

However, I'm wildly unsure whether the linker can do anything like the above at present. I'd had a mind to look into this for my own hobby OS project, but I hadn't actually spent much time on it. It may take some linker work, but that isn't exactly an obstacle, it's just something that adds to the latency of a real solution for this.

That being said, I do want both load-time relocatable executables and real PIC with a GOT and PLT to eventually be artifacts that llvm-mos can produce, independently of CP-M/65. It's come up enough times in enough contexts that it seems like a good idea. I've just no idea when I'd get around to it. It's also very researchy; I handwaved a lot of the above, such as whether the linker really has enough information to tell what should be relocated and what shouldn't. It really feels like it should know though; I can't see a meaningful difference between shared libraries or PIC executables and the CP/M-65 case.

I had a chat with Roland McGrath (look him up, but I didn't didn't ping him here, since hopefully he has better things to do ;) about this, and he had a ton of useful historical perspective.

Apparently there is actually a lot of prior art for doing this kind of load-time relocation in ELF executables in the UNIX world. 32-bit x86 shares the 6502's difficulty in emitting PC-relative addressing, so it was common for contemporaneous linkers to emit "TEXTREL" text relocations to point code at a PLT or GOT. These would be stored in the dynamic sections and be fixed up at load time, but they point at the .text section, rather than .got or .plt. This is precisely equivalent to what CP/M-65 is doing.

The biggest revelation from this discussion was that I was modelling how this worked incorrectly: there's absolutley nothing in the compiler that enables this. -fPIC and -fPIE purely have to do with requesting the compiler emit GOTs and PLTs; if you've already decided that you want load time relocation instead, then the compiler already emits a relocatable object: that's just a regular .o file!

So, it's the linker that needs a feature to support turning a collection of relocatable .o files into a load-time relocatable executable using TEXTREL. Fangrui (LLD maintainer) generally hates TEXTRELs, and they're generally dispreferred due to disabling sharing between executables and making more pages of the executable writable than need be, which harms security. So I doubt LLD has any special handling for this today, but I also doubt it would be particularly difficult to add.

That being said, when I think about how that feature might actually work, it seems very similar to what you're already doing with the linker today... which suggests that there are facets of our SDK that are hostile to this working. The existence of symbols like __zp_data_size is one, just as you've highlighted. Notably, UNIXen use __start and __stop symbols for this purpose, and now I think I understand why. In a relocatable binary, symbol values are taken to refer to addresses that are relative to the ELF base: the ELF base is in turn the smallest VMA of any PHDR. We could take this to be zero by fiat on the 6502, and indeed I believe our linker backend is actually set up that way.

That also means that it isn't generally possible to encode an absolute address reference into a load-time relocatable binary at all; after all, how could the linker know anything about where the program will be loaded? Instead, everything in the program image would need to either relative to 0 (program start) or an undefined dynamic symbol reference. The former addresses can have an addend added by the loader, and the latter have their actual absolute addresses substituted in by the loader.

So, @davidgiven, I wanted to specifically ask: Would making sure that everything in the process image is "zero relative" work for the CP/M-65? My intuition says yes; programs should already have undefined references to the BDOS fixed up by the loader, right?

The one snaggle I could see is separation between zero page and the regular image; but that would be relatively straightforward to encode: just treat addresses 0-100 to be the zero page, by fiat. The loader would then just need to add a different base to small symbol values than to large ones.

If this would work out, I could start looking at what in common actually violates this property. I do want both PIC (someday) and load-time relocation to work, so I think our common should broadly maintain this property, so long as it's actually sufficient to make operating systems like CP/M-65 work.

With that repaired, I'd expect that the existing -q based solution used by CP/M-65 would work again. But it would also open the door to a much more standard dynamic symbol version of the tool in the linker; this would remove the need for any shenanigans, and the resulting binaries would be absolutely bog-standard ELF, analyzable with any of the flotilla of usual tools. The ELF to CPM tool would just instead need to consume the .dynamic and dynamic relocation sections instead of the usual relocation ones.

llvm-mos / llvm-mos-sdk

Load addresses aren't relocated on CP/M-65 #320