Design considerations for rewamping the handling of features and modes

Preamble: This issue contains my current thinking; it will be amended as I get wiser. Please append comments to issue 183 rather than here.

Lots of Hercules non-design is long past the end of its tether. The failure of the I/O design is a prime example; instruction decode is another area.

Having just three architecture modes and compiling instructions differently for each architecture because operands are fetched from storage in different ways from 360 to z is not the way forward.

First of all, just having the three modes--which I shall later call configurations--is an arbitrary restriction; someone who wish to implement a true 360 should be able to do so within the framework, and whoever wishes this 360 machine to implement decimal floating point should also be able to do so without having to write code to modify Hercules; and he (male/female) should be able to do so with the same ease (or not) that the 370, ESA, and z architectures are specified.

So let us step back. A particular model contains a certain amount of hardware, storage, and expanded storage. Adders and whatever is needed to perform computation are assumed as part of implementing the instructions for such facilities.

So, at the base of it all, we have instructions (e.g., ALC) and hardware features (e.g., general registers, additional floating point registers, vector registers, CPU timer, what not).

A facility implements hardware features and/or instructions, or both.

A number of predefined facilities exist, perhaps closely related to the bits stored by STFLE, but anyone can create a new facility for his (m/f) own use.

Likewise, anyone can create a configuration in addition to the predefined ones. I envisage distinct predefined configurations for all z models.

How to implement?

Instructions need decoding, so we start with the operation code. The configuration will determine which instruction is issued. For example, ALC (opcode e398) will be implemented by alc-rxe in any configuration that does not include long displacement (this includes SIE ESA under z); and as alc-rxy in a configuration that includes long displacement.

Decoding the displacement is part of instruction decode. The actual code that does what is specific to add with carry will be common to ALC, ALCR; ALCG and ALCGR use a 64 bit ALU, but are otherwise identical to the other two.

How to get there?

Doing the decode that is currently done by macros in each instruction, e.g., RR(...) will clearly need to change; and masses of laboriously copied code will be sent where it should have been in the first place; operands will be fetched into the CPU (REGS).

Doing all of this at once is a truly herculean task, so the opcode table will need to be built in either new or old format as instructions are migrated. This can be managed.

The crunch is no doubt with privileged operations. If separate execution is required, it can be managed with a suffix on the opcode as for ALC. This will take execution to the appropriate function, and this will be obvious to all who looks.

Observations:

I tried to map the feature codes in Appendic B (APXB) to the bits stored by STFLE. Some map 1:1; others are not in the facilities list: MSA-1 and MSA-2 (could be that they are in the ESA list; haven't looked there yet).

**** Please read the top and DO NOT comment here ***

Some ideas about getting there.

There are really two topics mentioned: construction of a configuration that includes instructions and changes to the way the Hercules CPU operates. Without getting into the details of the REGS structure itself it contains global state, such as the registers, PSW and other global controls. The actual instruction execution is not part of the REGS structure. Instruction execution and its state is totally contained within the executing functions stack frame. The decoding of the instructions is done in the stack frame. Operand accesses are placed in the stack frame. Detection of privileged vs non-privileged instructions occurs in the function.

Let's hypothesize that the instruction decode and operand access is now moved to some other function that is identified in the instruction decode table and a different function actually performs the work of the instruction. How will the operands be communicated to this new function? I propose that instruction execution state (such as operands and results) should move to the REGS structure. This would put into the instruction decode table some new functions (instruction decode/operand fetching, instruction execution and result write back). For operand write back, some information from the operand decode would also need to be preserved in the REGS structure so that the last step knows where the result goes.

During transition, the instruction decode table would need to be enhanced to add the new functions. Whether the legacy function or the new functions are in use can be decided by the state of the legacy function. If it is there the current function is called. If it is absent, then the CPU emulating code knows to call the sequence of new functions in the table.

Rather than using functions, one could use a computed go to a label within a large function. Many emulators take this approach. A file constructed completely outside of the engine could be loaded that includes the decode table and indexes for selecting the label. I've experimented with this kind of design and how to keep the file content and label usage in sync. Of course, my tool for this is written in Python. This concept eliminates function calls entirely.

There are other schemes that could be developed, but for transition there is dependence upon the instruction decode table to participate until such time as the transition is complete. The new scheme could sit completely outside of the existing instruction decode table but you still need to know which instructions are being done the old way and which the new. The existing decode table is the easiest place to put that piece of information. The decode table could be left as is and the value is removed (set to null) for those instructions using the new approach.

What I have described is sort of a bottom up approach. One could decide to totally replace the way the decode tables are constructed but continue to use the legacy functions for instructions that have not been converted to the new approach. This would be more of a top down approach.

Once the transition is complete. Something else can be done with the CPU. Today the registers in the REGS structure are stored in big-endian order. There is no need for that to be the case. CPU registers may be stored in host byte-order. That is how they are stored in the legacy function's stack frame, so they can be stored that way in the CPU. Only when values are written to memory do they need to be in big-endian byte order.

The new functions placed in the instruction decode table could be more than what I suggest and even variable depending upon the instruction. It really comes down to the tool that builds the table and the new CPU code that accesses the new table functions.

At the other end of the spectrum is the building of the instruction decode tables with a new method. Today that means the creation of C code that gets executed. For discussions we have had around instruction selection this might have a quicker reward for the community at large. All of the tables get built at run-time. Every time Hercules is started the equivalent of s370x is run to build the decodes tables. Once there we can figure out how to start changing the table to redesign the CPU.

After writing this and thinking through these things this last approach might be the best way to get started. Go through the drudge work of creating the data base putting the structure and creation of the decode tables outside of static code. Once this is in place, we can start thinking about how to change the tables that change the CPU.

hercules-390 / hyperion

Design considerations for rewamping the handling of features and modes #185