Open tyalie opened 8 months ago
As we've quite a few opcodes to cover (about 600 of them), it is of importance that we structure them in the right classes to minimize repeating our self to often (and possibly introduce bugs alongside it). This is important, as the STM8 ISA manual does not give a usable categorization to us.
I wrote a small analysis script which used the information available from the stm8-binutils-gdb package (more precisely stm8_opcodes
in opcodes/stm8-opc.c
) to determine how the opcodes could be grouped, what patterns are contained within and other things that might be useful for us. You can find it here.
But I'll give the summary below. Details regarding the opcodes are detailed in the STs PM0044 - STM8 CPU Programming Manual Rev 3
The microcontroller is a CISC architecture (I know in a 8bit microcontroller…) where each supported operand combination of a mnemonic has their own opcode. Also, a single instruction is at most 5 bytes long. That is (each segment is one byte)
| pre-code | opcode | arg1 | arg2 | arg3 |
The pre-code is optional and if present it can be seen as a two-byte op-code. It can be seen as a modifier, as it can only be one of 0x90
, 0x92
, 0x91
or 0x72
(PDY, PIX, PIY and PWSP respectively). See PM044 page 62.
Let's start with the fun facts, a lot of mnemonics have only a single opcode (e.g. nop
, ret
, …). There are two mnemonics that I already hate which are ldw
and ld
with 44 and 37 unique opcodes respectively.
Luckily for us there are 26 different operator tuples present in the ISA. This is using the operator types given in PM0044. As a lot of these are almost duplicates of each other (e.g. SHORTOFF_X
and SHORTOFF_Y
), it stands to reason on whether our categorization should just deduplicate this list.
But for us relatively interesting are the possible multiclasses that we can build. I define a possible mnemonic class as one that has the same set of operand tuples and when there is a free, but static number X, which is the difference between two opcodes that have a matching operand tuple.
Using this I could for example discover, that the add
and or
mnemonic could be modeled using a single TableGen multi-class. What I haven't yet done is see whether some multi-classes represent a subset of another one. Which would even further reduce our workload.
This PR tracks the progress of implementing all STM8 instructions in our LLVM backend. As of right it is also responsible for implementing the associated classes, but this might change in the future.
The general gist is that there are more than 600 opcodes available on our architecture which need to be described and accordingly mapped to LLVM IRs mnemonics. As this is quite a feat, it's not easily done in an evening - especially as this is the first LLVM backend project for the contributors.
TODOs
This is one of the PRs which relies heavily on #5.