lscharen / iigs-game-engine

A tile-based game engine for the Apple IIgs written in 65816 asssembly language
32 stars 1 forks source link

Threaded JMP rendering #6

Open lscharen opened 2 years ago

lscharen commented 2 years ago

If a data word in the PEA field requires masking, then the 3-byte code sequence is a JMP instruction into the Snippets code array so that the expanded LDA/AND/ORA/PHA code can be executed. Currently, control is returned via a JMP back into the PEA field. If there are multiple masked words in a sequence, then we waste 3 cycles by doing a JMP to the PEA field just to do another JMP to next Snippet.

If there was a way to move directly to the next Snippet, this could save 3 cycles per word in the rendering pipeline. Snippet handling in on the slow path, so this optimization would not increase the maximum frame rate, but would reduce the performance impact of creating complex scenes.

lscharen commented 2 years ago

Since the exit code from the PEA field is always patched into the "next" instruction and the 2-byte value is saved, the code could check to see if it is patching a JMP instruction. We don't need to know what the "previous" instruction is, just that it's possible there may be 2 consecutive JMPs.

In that case, the code would patch out the first two bytes of the snippet code rather than the two bytes in the PEA field.

The additional testing would add a bit of overhead, but the snippet code address for a given column is fixed so the impact should be minimal, but since the optimization is small it would probably require at least 3 or 4 threaded jumps per line to be a net win.

Realistically, most snippets occur on the edge of tiles and, since tiles are mostly solid the odds of consecutive JMP/JMP opcodes is kind of low. Also, it's not clear how much work it would be to detect adjacent JMP instructions when updating the PEA field.

Defer implementation until Version 2.0. Plan to put behind a feature flag.