Open jroweboy opened 1 year ago
It seems LLVM already has some machinery for making this possible, which is visible on zero-page pointers (where the base and index bitwidth is the same). With the latest batch of __zeropage patches, a slightly modified version of the above code:
constexpr int kNumEntities = 20;
char screen_x;
char x_pos[kNumEntities] __zeropage;
int main() {
for (int i = 0; i < kNumEntities; ++i) {
x_pos[i] += screen_x;
}
}
compiles down to:
0000a01f <main>:
a01f: a2 ec ldx #$ec
a021: ac 00 02 ldy $200 ; 0x200 <screen_x>
a024: 98 tya
a025: 18 clc
a026: 75 34 adc $34,x ; 0x34 <x_pos+0x14>
a028: 95 34 sta $34,x ; 0x34 <x_pos+0x14>
a02a: e8 inx
a02b: d0 f7 bne $a024 <main+0x5>
a02d: a2 00 ldx #$0
a02f: 8a txa
a030: 60 rts
which is a perfect match for the suggested optimization.
A common 6502 loop optimization is to avoid the compare on the loop iterator when doing a loop that doesn't use the iterator for indirect addressing. Two ways that this optimization can be implemented is to either change the iterator to count down, or offset all of the absolute address accesses by the final loop counter.
Here's the standard copy loop as generated by llvm-mos today:
But with this optimization, the loop could become the following. The accesses to the addresses end up being in the same order, which makes it a pretty safe optimization in general.