Open eilers opened 9 months ago
It needs to be discovered what is the reason for this.
I assume it's because LLVM still thinks that zero page variables are at $00xx. In some addressing modes, this is important - zero page variables aren't always accessed using zero page modes.
It needs to be discovered what is the reason for this.
I assume it's because LLVM still thinks that zero page variables are at $00xx. In some addressing modes, this is important - zero page variables aren't always accessed using zero page modes.
What I found surprising was, that __basic_zp_start
is $0002
and not $02
. I don't know how this mapping to 8bit works at the end.
Variables marked as "zero page" will utilize zero page addressing modes. The value being $0002
or $02
is irrelevant - both are the number 2
; our assembler does not count the number of digits used.
@asiekierka I tried to use $1602 as __basic_zp_start
. But there was an error while compiling the code:
ld.lld: error: .//m65script.prg.lto.o:(function (anonymous namespace)::FreeChunk::insert(void*, unsigned int): .text._ZN12_GLOBAL__N_19FreeChunk6insertEPvj+0x2e): relocation R_MOS_ADDR8 out of range: 5636 is not in [0, 255]; references '__rc2'
>>> referenced by ld-temp.o
>>> defined in /Users/stefan/Developer/c65/llvm-mos/bin/../mos-platform/common/li
Yes, because it's trying to fit $1602
into eight bits :-)
@asiekierka I tried to use $1602 as
__basic_zp_start
. But there was an error while compiling the code:ld.lld: error: .//m65script.prg.lto.o:(function (anonymous namespace)::FreeChunk::insert(void*, unsigned int): .text._ZN12_GLOBAL__N_19FreeChunk6insertEPvj+0x2e): relocation R_MOS_ADDR8 out of range: 5636 is not in [0, 255]; references '__rc2' >>> referenced by ld-temp.o >>> defined in /Users/stefan/Developer/c65/llvm-mos/bin/../mos-platform/common/li
This is the linker check I mentioned on Discord. On most CPUs, 5636 isn't a valid address you can access with a zero page addressing mode, but on the mega65 CPU, it is, since you can set BP to make that access legal. The linker needs to be informed about this possibility, so that it doesn't check that bits 8-15 of the 32-bit virtual address for the symbol are zero.
Yes, because it's trying to fit
$1602
into eight bits :-)
Ok, I rejected the idea as wrong anyway. I just thought that I might be able to work around the problem temporarily.
Yes, because it's trying to fit
$1602
into eight bits :-)Ok, I rejected the idea as wrong anyway. I just thought that I might be able to work around the problem temporarily.
I wouldn't say it's wrong; if an address llvm-mos accesses via the zero page addressing mode actually is $1602, then its symbol value should be $1602. There's no need to lie that way; ideally, symbol values that you'd read out of an ELF file should correspond to actual addresses on the system.
so that it doesn't check that bits 8-15 of the 32-bit virtual address for the symbol are zero.
That's insufficient - there's still the LTO
zero page allocation pass, as well as variables explicitly marked as "zero page".
We can either add a compilation flag to prohibit the compiler from knowing how to convert a zero page pointer to a regular pointer, or a compilation option to configure where exactly the zero page is redirected (this option could also effectively generalize across to the HuC6280 and 65C816, by the way!).
so that it doesn't check that bits 8-15 of the 32-bit virtual address for the symbol are zero.
That's insufficient - there's still the
LTO
zero page allocation pass, as well as variables explicitly marked as "zero page".We can either add a compilation flag to prohibit the compiler from knowing how to convert a zero page pointer to a regular pointer, or a compilation option to configure where exactly the zero page is redirected (this option could also effectively generalize across to the HuC6280 and 65C816, by the way!).
That's true; it's necessary, but not sufficient. I don't think anything but the pointer case has any implications on the compiler though; it just tags symbols as either being in the zero page or not, and it uses different addressing modes. In both cases, the actual symbol values are assigned by the linker.
But, for zero page pointers, there isn't actually enough information available at runtime to extend the pointer. So it makes sense to me to make that decision by fiat; probably as a target attribute. Then, you could set it on a per-function using C attributes; that's about a general as we could reasonably support. I think your idea for an option to prevent 8->16 conversions makes sense; it would play nice with per-function overrides as a way to catch mistakes.
Yes, because it's trying to fit
$1602
into eight bits :-)Ok, I rejected the idea as wrong anyway. I just thought that I might be able to work around the problem temporarily.
I wouldn't say it's wrong; if an address llvm-mos accesses via the zero page addressing mode actually is $1602, then its symbol value should be $1602. There's no need to lie that way; ideally, symbol values that you'd read out of an ELF file should correspond to actual addresses on the system.
Yes, But if the BP register is set to $16, you would automatically accessing $1602 if you use the address $02.. Thus, starting the Zero-Page with $02 is still correct. It depends on the BP register where it is actually located in memory.
But this really depends how this __basic_zp_start
is really meant to be interpreted in llvm-mos.
I'll trust you guys what the right way to set it would be.. π
But, for zero page pointers, there isn't actually enough information available at runtime to extend the pointer.
Actually, now that I think about it again, there is! You can use TBA
on 45GS02 and TDC
on 65C816 to extend the pointer at runtime.
The real problem, in my eyes, is the fact the "zero page" also includes the virtual registers - this is why I don't think you can really support this on a per-function basis well, at least not for a "version 1" implementation; it breaks the calling convention, for example.
Yes, But if the BP register is set to $16, you would automatically accessing $1602 if you use the address $02.. Thus, starting the Zero-Page with $02 is still correct. It depends on the BP register where it is actually located in memory. But this really depends how this
__basic_zp_start
is really meant to be interpreted in llvm-mos. I'll trust you guys what the right way to set it would be.. π
The zero page isn't really a region of memory; it's an addressing mode. For a given instruction, it says "treat the number I hand you this way, and use it to find the real address". BP just changes the way that mapping occurs. But the underlying addresses are just addresses; those are the ones that go into linker scripts and get assigned to symbols. The exception is the two high bits of the 32-bit symbol addresses; we use those as generic "bank numbers", since addresses are 16-bits for the most part.
But, for zero page pointers, there isn't actually enough information available at runtime to extend the pointer.
Actually, now that I think about it again, there is! You can use
TBA
on 45GS02 andTDC
on 65C816 to extend the pointer at runtime.
That's great, and we should definitely do that.
The real problem, in my eyes, is the fact the "zero page" also includes the virtual registers - this is why I don't think you can really support this on a per-function basis well, at least not for a "version 1" implementation; it breaks the calling convention, for example.
That's a good point, I'm going to have to think more on this. Imaginary registers may be worth special casing somehow.
So here's my thoughts. (Apologies if this retreads some ground already discussed; I'd only been loosely following the discussion.)
Imaginary registers pull a lot of weight. Like modern registers, they provide a compact way to refer to arguments, parameters, and temporary values of functions. The callee/caller-saved convention allows the registers to be shared between callers and callees without conflict, and relatively efficiently.
There are alternative schemes like SPARC's register windows that also accomplish the above, but without the cost of saving and restoring registers. We could probably do this on the 65816, but not on the mega65, where the BPs don't overlap. But even on the 65816, it adds a one cycle penalty to all zero page accesses, and they're supposed to be as fast as the hardware can go. So it looks like they're here to stay.
That being said, as @asiekierka pointed out, they're not just memory locations; they're a contract between callers and callees. Callers and callees need to agree on the memory locations used, but otherwise, they can be arbitrary, so long as they're accessible with the zero page addressing mode. This would allow an interrupt chain or thread to use a completely different set of imaginary registers, which is highly desirable. This means that e.g. on Mega65 BP
would need to be consistent across calls. Calls to e.g. the KERNAL would require a custom calling convention; mediating this via inline assembly would be a good idea, since switching BP means losing access to the imaginary registers, making the dance tricky.
Presently, the compiler leaves the numeric value of the register up to the linker; it just issues references to __rcN
symbols, and the linker fills them in like any other address. This means that any addressing mode can be used to refer to those symbols; they're real 32-bit addresses. But, given the model above, we may actually want a runtime-decidable set of independent imaginary registers, each with different addresses. Accordingly, the current way of doing things doesn't scale to this model.
Still, the compiler has to emit something when it wants to refer to an imaginary register, whether in absolute or zero paged addressing modes. But, the same function could be called at runtime with many different sets of imaginary registers, so it can't statically know the actual addresses of these. The linker can't know either, so it can't be as simple as a symbol reference to a real address.
Here's a proposal. Instead of linker symbol values, we could consider imaginary registers to be on the other end of 8-bit pointers, where the values of these pointers are set by symbol references. The linker scripts would be able to place the registers as per usual, but for targets with variable BP, the symbols would always range from 0x0000 to 0x00ff, and the compiler would need to itself issue the logic to extend the pointer using BP to a 16-bit address if needed. This would allow a single function to be responsive to runtime calls by callers with varying BP.
Zero page variables and their zero page pointers, on the other hand, have actual fixed BPs, and those don't vary at runtime. The linker will assign them actual real memory locations, and it's up to the program to make sure that BP is set appropriately to access them. This does raise a wrinkle in extending zero page pointers using BP; it would have to assume that the current value of BP is sufficient to access the value. This essentially amounts to treating zero page pointer extensions as accesses, which seems reasonable.
Also, I can't actually recall anywhere where we use an imaginary register symbol with absolute addressing in the compiler proper; the only place I can recall offhand is __call_indir
; but we can provide a Mega65 variant that uses BP to fill in the high address there. We'd need to do the same for any SDK assembly that extends imaginary registers.
It's a decent proposal, thouh I would make sure to limit the changes to the compiler only. If we do it by adding a new address space (__basepage
) which corresponds to the "extend the pointer using BP to a 16-bit address" variant, we can even avoid breaking compatibility with existing code - as __zeropage
already de facto provides the "actual fixed BPs" approach.
It's a decent proposal, thouh I would make sure to limit the changes to the compiler only. If we do it by adding a new address space (
__basepage
) which corresponds to the "extend the pointer using BP to a 16-bit address" variant, we can even avoid breaking compatibility with existing code - as__zeropage
already de facto provides the "actual fixed BPs" approach.
What would extending __zeropage
on the Mega65 do then?
You're right, but that means I'm missing something in my mental model and I'm not sure what it is.
I just want to interject that I consider this to be an important discussion, and I also feel that some of the proposed solutions have far-reaching long-term implications. I suggest that we continue to talk about this problem for a bit longer before we make a decision here.
I was considering the idea of introducing a relocation type for this particular use case... Namely, there might be a new relocation that emits a tab instruction inline if it cannot be demonstrated that the BP is correct for the current instruction. This may have far-reaching implications too, and I am not at all sure it is the way to go. But it may be possible.
Another possibility would be a new relocation type that simply truncates the top 8 bits of the 16 bit address, and slams the lower 8 bits into the instruction. This feels wrong to me somehow, like we are throwing away an important clue, but it may be possible.
I am very open to different approaches here.
I'm looking at approaches other compilers have taken. ca65 seems to deal with it by not dealing with it, although Greg King started a fork to provide support for base pages. vasm seems to introduce a new assembler directive, .setbp, which tells following code that it can assume that the B register is set to a specified value. If that is correct, then .setbp is a bad name for the directive, as it doesn't actually set the B register.
I was considering the idea of introducing a relocation type for this particular use case... Namely, there might be a new relocation that emits a tab instruction inline if it cannot be demonstrated that the BP is correct for the current instruction. This may have far-reaching implications too, and I am not at all sure it is the way to go. But it may be possible.
This is how e.g. "range extension" is done on most architectures; if you have a branch that is too far, or a PC-relative address that is too far away, then it will insert a "thunk" or materialize a constant nearby and redirect to that thunk or constant.
In this case though, the linker wouldn't be able to do even trivial reasoning about the value of the B register, since it has no notion of control flow, functions, etc; it barely has a model of machine instructions. It would then need to conservatively set BP any time such a relocation appears; in that case, one may as well use absolute addressing instead. Contrast the thunks above; these are based on distances, so the linker can reason what ranges a thunk can be reused within.
Another possibility would be a new relocation type that simply truncates the top 8 bits of the 16 bit address, and slams the lower 8 bits into the instruction. This feels wrong to me somehow, like we are throwing away an important clue, but it may be possible.
This is the semantics I'd argue the zero page relocation should have on the 65CE02 and 65816. In the absence of BP inserting relocs as per the above, a zero page reloc would amount essentially to a promise that BP is already correctly set to the high byte of the symbol address. There would be no way for the linker to verify this, especially since the same function could (incorrectly) be called from several contexts with other BPs.
Hello, the sod responsible for the 45GS02 here :) Some key things that come to mind are:
Now some thoughts of processor tweaks that I could provide at relatively low cost:
- Do you want this to support C= C65 prototypes with their 4510, or only the MEGA65 and its 45GS02? This has important consequences, because ...
I think if there's interest in supporting the prototype, and it doesn't do horrible things to the compiler, and someone wants to do it, it's a good idea. Those three ifs multiply together though to make a probability that seems pretty low to me. I definitely have no special interest in the C65; and only a cursory one for FPGA 6502 derivatives. But I wouldn't want to stand in the way of the enthusiasm of others, excepting the cases those various enthusiasms come into conflict.
I can say I don't remember anyone expressing an interest in supporting the C65 so far.
- In the above discussion, there are a lot of use-cases that introduce various challenges/limitations/interactions. They are described diffusely above. It would be great to maintain a definitive list of the use-cases that the compiler has to deal with, so that we can objectively and succinctly assess whether a given proposed solution addresses them all, and how efficiently.
This is rough; as long as I've been a compiler engineer, such lists are really hard to come by, and they can usually only be created by domain experts with an absolutely ridiculous amount of experience; everyone else doesn't know how much they don't know. That's hard to come by in this space; I have a ton of compiler/linker experience (relatively speaking), but much less with the various target systems, save a couple. Others are extremely familiar with target systems, but usually lack compiler/linker experience. Lacking this, usually the best you can do is just put stuff out there "in the soup", and let it be broken. When it breaks, it breaks in a specific, tangible way; and that fosters discussion. The right people will eventually complain about it, and then you can go in and fix it.
Honestly the mega65 target is already pretty much in that state; this whole thread is the right people complaining about it. So I'd expect a simple BP management approach to be similar.
- In terms of (2) above, we could, for example, add a PHB / PLB instruction pair by using an instruction prefix on one, say, PHX and PLX, to create the ability to efficiently save and restore the BP, without poisoning A, which is the main problem I see with doing, say, TBA / PHA , PLA / TAB to wrap changes to BP, e.g., for calling KERNAL functions.
I think most of this is Quality of Life stuff; being able to work around stuff like that in a compiler is table stakes. Were those present, we wouldn't use them right now, just like we don't use any of the other 45GS02 extentions, just for lack of dev time. Accordingly, I don't think slosh in managing BP would compare to not using e.g. stack-relative addressing.
- For far jumps and far returns there is already a half-finished mechanism for 32-bit flat address calls. It does introduce some limitations on the memory map, because it works by mapping a 16KB slab at $8000-$BFFF that contains the address, and executing the code there. Combined with the 32-bit ZP pointer instructions, this in principle gives you a means to create a flat 32-bit memory model for the MEGA65. If there was interest in having LLVM-mos support these to allow programs >64KB, I could be fairly easily convinced to finish their implementation.
Similarly, getting to the point where the backend would be able to use something like this is a long ways off. I do think we want to support the equivalent of the various x86's near, far, and huge memory models for code and data, but while we've talked off and on about it at length, it's not really at the top of anyone's TODO list. (Except maybe if @asiekierka gets a new computer ;) )
Do you want this to support C= C65 prototypes with their 4510, or only the MEGA65 and its 45GS02? This has important consequences, because ...
The LLVM backend does distinguish between the 4510 and 45GS02. However, that mostly means we provide correct assembler support for all 6502 subtypes. The question of what's supported for compilation by the LLVM backend is separate; for example, on 65C02 and its derivatives, we currently compile PHX/PHY/PLX/PLY
, INC/DEC A
, BRA
, indexed indirect JMP
, the non-indexed indirect addressing mode, and STZ
. We only assemble, for example, TRB
and TSB
- while there are scenarios where these opcodes could lead to more optimized C code, the compiler is not yet "taught" on how to apply them in practice.
Likewise, we could implement something for the 45GS02 only, but not the 4510. The question is mostly that of the effort required and whether someone will put it in.
It would be great to maintain a definitive list of the use-cases that the compiler has to deal with
Your best option is to get involved with LLVM-MOS development, I'm afraid; the best way to deeply understand how the compiler thinks is to study and work with it at a low level. Note also that other compilers handle things differently (for example, SDCC has its own intermediate representation which predates the popularity of SSA), so things which work well for us might not work so well for others.
@gardners @mysterymath @asiekierka I've taken the liberty of creating a new issue, #317, specifically for continuing the important discussion of possible architectural changes to the 45GS02. Hopefully this will allow us to continue the discussion re "correct" handling of direct page in the presence of a B register here.
Hello! MEGA65 ROM developer here. I'm excited to see the brainstorming in this thread, and can see how it could result in useful improvements to MEGA65 support specifically and llvm-mos in general. The opportunity to use target CPU features like the relocatable base page could be compelling for a bunch of reasons.
I do want to amend the problem statement in this ticket, though. The motivating premise was that the MEGA65 KERNAL reserves the entirety of the zero page. This is only true in the same sense as other Commodores: the KERNAL uses a limited range of upper ZP, BASIC uses the rest, and if machine code generated by a compiler doesn't need to return to BASIC, it can safely use BASIC's ZP region and still keep the KERNAL active (screen terminal, IRQ handler, I/O calls). If such a program does want to return cleanly to BASIC (as in, an RTS to the SYS command, not just a warm boot), it has to preserve/restore BASIC's ZP region. (And of course, a program that installs new hardware IRQ handlers and never calls the KERNAL can do whatever it wants.)
We've been telling early MEGA65 developers to avoid touching ZP entirely while we confirm what the KERNAL API contract should say about its use of the ZP. This is only meant to be temporary, and we're close to being able to document $02-$8F as available, similar to other Commodores. $03-$0B are currently part of the JMPFAR/JSRFAR KERNAL calls, but if llvm-mos doesn't call that, it can use the full range, or just use $0C-$8F for safety. From the KERNAL's perspective, it's really just a testing and documentation project at this point. (We need to confirm that we didn't accidentally add anything to the KERNAL that uses lower ZP addresses, for example.) I want to finish and document this as soon as next month.
This issue has come up a few times on our side because people start out writing llvm-mos programs that try to return normally at the end of main(), only to see BASIC misbehave afterwards. I wonder if there is canonical behavior we can add to the MEGA65 target's main() exit to improve this. I don't know what Commodore C programmers normally expect (warm boot? "press a key" followed by warm boot? infinite loop?), so I yield to your judgement. We could even have llvm-mos's MEGA65 target document $1600-$16FF as reserved, and do the BASIC ZP copy/restore in main() entry/exit by default. I'm sure we can get a MEGA65 dev to implement the details.
If there's anything we can do on the KERNAL side to simplify things, please let me know.
This issue has come up a few times on our side because people start out writing llvm-mos programs that try to return normally at the end of main(), only to see BASIC misbehave afterwards. I wonder if there is canonical behavior we can add to the MEGA65 target's main() exit to improve this. I don't know what Commodore C programmers normally expect (warm boot? "press a key" followed by warm boot? infinite loop?), so I yield to your judgement. We could even have llvm-mos's MEGA65 target document $1600-$16FF as reserved, and do the BASIC ZP copy/restore in main() entry/exit by default. I'm sure we can get a MEGA65 dev to implement the details.
On the outside, this feels like a mistake. I think there's three aspects of the commodore
family targets that come into conflict:
If any of the above weren't true, there'd be a way to use the resulting PRG files such that they cleanly returned, but as is it feels like one of the second two points should change. We may want to either split the commodore targets into BASIC and no-BASIC variants, or if it's possible, provide a link-time configuration symbol for this. Whatever the default ends up being, it should infinite loop if it isn't safe to return to BASIC; that at least may let one inspect the output.
We could even have llvm-mos's MEGA65 target document $1600-$16FF as reserved, and do the BASIC ZP copy/restore in main() entry/exit by default. I'm sure we can get a MEGA65 dev to implement the details.
This is also a reasonable option, and it's come up before. I'll admit to not being sure what the best way to proceed here is; it probably bears some thought and discussion.
The easiest change we could make to make this consistent would be to change the commodore
targets to exit-loop
instead of exit-return
; that's one line of CMake. It looks like cc65 doesn't use the BASIC ZP at all and returns. That's one possible configuration. Another is to save and restore BASIC ZP on entry/exit and use it, and another is to clobber it and never return. Ideally there would be a simple way to spell each of these for each commodore
target in the SDK, and hopefully, without too much monkeying around in linker scripts. That's the part that needs designing; we've built out a huge degree of this kind of configuration in the NES targets, so it may be possible, but it can be tricky.
EDIT: Looking at the git blame, it looks like this was my fault, from day one of my original SDK rework. Still, we don't have a strong backcompat guarantee beyond using Semantic Versioning, so we should definitely fix this situation.
Please follow the discussion here: https://discord.com/channels/1058149494107148399/1058149494107148402/1210065267716259880
Problem:
The memory between $0000 - $15FF is currently documented as reserved on the MEGA65. llvm-mos currently uses the address $0002 - $0090 for imaginary registers (and more) and therefore conflicts with various ROM routines which are expecting to use the so called Zero-Page for information as well.
Proposed solution
The CPU on the MEGA65 provides a BP (Base Page) register that is used as high byte for every Load or Store operation with 8bit addresses (so called Zero Page addressing mode). Thus, the Zero-Page (now called Base-Page) can be relocated freely within the first bank.
This solution proposes to use the BP register to relocate all accesses using the 8Bit address mode (Base-Page addressing) to $1600 - $16ff by setting BP to $16. llvm-mos can therefore still put the imaginary register to a fast memory by using a BP other then $00. Inline assembly that needs to call any ROM routine, needs to set the BP register back to $00 before entering the kernel and switch back to $16 afterwards. The performance impact can be held quite low by using this concept.
Current State:
I've modified the unmap-basic.S so that it is setting the BP to $16. I thought about changing
__basic_zp_start
from $0002 to $1602 but I decided against it.imag-regs.ld
uses this address as base for defining the addresses of the imaginary registers. As these registers should be in the BP and we want still use BP addressing for them, it would make no sense to use 16bit addresses IMHO.I've discovered that printf() and malloc does not work after this change, unfortunately. Thus, we see side effects. It needs to be discovered what is the reason for this.