howerj / forth-cpu

A Forth CPU and System on a Chip, based on the J1, written in VHDL
332 stars 29 forks source link
c cpu forth fpga processor simulator softcore target-board vhdl

Forth computing system

Project Forth SoC written in VHDL
Author Richard James Howe
Copyright 2013-2019 Richard Howe
License MIT/LGPL
Email howe.r.j.89@gmail.com

H2 build status

Introduction

This project implements a small stack computer tailored to executing Forth based on the J1 CPU. The processor has been rewritten in VHDL from Verilog, and extended slightly.

The goals of the project are as follows:

All three of which have been completed.

The H2 processor, like the J1, is a stack based processor that executes an instruction set especially suited for FORTH.

The current target is the Nexys3 board, with a Xilinx Spartan-6 XC6LX16-CS324 FPGA, new boards will be targeted in the future as this board is reaching it's end of life. The VHDL is written in a generic way, with hardware components being inferred instead of explicitly instantiated, this should make the code fairly portable, although the interfaces to the Nexys3 board components are specific to the peripherals on that board.

A video of the project in action, on the hardware, can be viewed here:

https://user-images.githubusercontent.com/1807662/159563952-729cdfa2-9eef-4969-ba0f-ac8bbb03ce9c.mp4

The SoC can also be simulated with a simulator written in C, as shown below:

GUI Simulator

The System Architecture is as follows:

System Architecture

License

The licenses used by the project are mixed and are on a per file basis. For my code I use the MIT license - so feel free to use it as you wish. The other licenses used are the LGPL and the Apache 2.0 license, they are confined to single modules so could be removed if you have some aversion to LGPL code.

Target Board

The only target board available at the moment is the Nexys3, this should change in the future as the board is currently at it's End Of Life. The next boards I am looking to support are it's successor, the Nexys 4, and the myStorm BlackIce (https://mystorm.uk/). The myStorm board uses a completely open source toolchain for synthesis, place and route and bit file generation.

Build and Running requirements

The build has been tested under Debian Linux, version 8.

You will require:

Hardware:

Xilinx ISE can (or could be) downloaded for free, but requires registration. ISE needs to be on your path:

PATH=$PATH:/opt/Xilinx/14.7/ISE_DS/ISE/bin/lin64;
PATH=$PATH:/opt/Xilinx/14.7/ISE_DS/ISE/lib/lin64;

Building and Running

To make the C based toolchain:

make embed.hex

To make a bit file that can be flashed to the target board:

make simulation synthesis implementation bitfile

To upload the bitfile to the target board:

make upload

To view the wave form generated by "make simulation":

make viewer

The C based CLI simulator can be invoked with:

make run

Which will assemble the H2 Forth source file embed.fth, and run the assembled object file under the H2 simulator with the debugger activated. A graphical simulator can be run with:

make gui-run

Which requires freeglut as well as a C compiler.

Related Projects

The original J1 project is available at:

This project targets the original J1 core and provides a eForth implementation (written using Gforth as for meta-compilation/cross compilation to the J1 core). It also provides a simulator for the system written in C.

The eForth interpreter which the meta-compiler is built on can be found at:

Manual

The H2 processor and associated peripherals are now quite stable, however the source is always the definitive guide as to how instructions and peripherals behave, as well as the register map.

There are a few modifications to the J1 CPU which include:

H2 CPU

The H2 CPU behaves very similarly to the J1 CPU, and the J1 PDF can be read in order to better understand this processor. The processor is 16-bit with instructions taking a single clock cycle. Most of the primitive Forth words can also be executed in a single cycle as well, one notable exception is store ("!"), which is split into two instructions.

The CPU has the following state within it:

Loads and stores into the block RAM that holds the H2 program discard the lowest bit, every other memory operation uses the lower bit (such as jumps and loads and stores to Input/Output peripherals). This is so applications can use the lowest bit for character operations when accessing the program RAM.

The instruction set is decoded in the following manner:

+---------------------------------------------------------------+
| F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---------------------------------------------------------------+
| 1 |                    LITERAL VALUE                          |
+---------------------------------------------------------------+
| 0 | 0 | 0 |            BRANCH TARGET ADDRESS                  |
+---------------------------------------------------------------+
| 0 | 0 | 1 |            CONDITIONAL BRANCH TARGET ADDRESS      |
+---------------------------------------------------------------+
| 0 | 1 | 0 |            CALL TARGET ADDRESS                    |
+---------------------------------------------------------------+
| 0 | 1 | 1 |   ALU OPERATION   |T2N|T2R|N2A|R2P| RSTACK| DSTACK|
+---------------------------------------------------------------+
| F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---------------------------------------------------------------+

T   : Top of data stack
N   : Next on data stack
PC  : Program Counter

LITERAL VALUES : push a value onto the data stack
CONDITIONAL    : BRANCHS pop and test the T
CALLS          : PC+1 onto the return stack

T2N : Move T to N
T2R : Move T to top of return stack
N2A : STORE T to memory location addressed by N
R2P : Move top of return stack to PC

RSTACK and DSTACK are signed values (twos compliment) that are
the stack delta (the amount to increment or decrement the stack
by for their respective stacks: return and data)

ALU operations

All ALU operations replace T:

Value Operation Description
0 T Top of Stack
1 N Copy T to N
2 T + N Addition
3 T & N Bitwise AND
4 T or N Bitwise OR
5 T ^ N Bitwise XOR
6 ~T Bitwise Inversion
7 T = N Equality test
8 N < T Signed comparison
9 N >> T Logical Right Shift
10 T - 1 Decrement
11 R Top of return stack
12 [T] Load from address
13 N << T Logical Left Shift
14 depth Depth of stack
15 N u< T Unsigned comparison
16 Set CPU State Enable interrupts
17 Get CPU State Are interrupts on?
18 rdepth Depth of return stk
19 0= T == 0?
20 CPU ID CPU Identifier
21 LITERAL Internal Instruction

Peripherals and registers

Registers marked prefixed with an 'o' are output registers, those with an 'i' prefix are input registers. Registers are divided into an input and output section of registers and the addresses of the input and output registers do not correspond to each other in all cases.

The following peripherals have been implemented in the VHDL SoC to interface with devices on the Nexys3 board:

The SoC also features a limited set of interrupts that can be enabled or disabled.

The output register map:

Register Address Description
oUart 0x4000 UART register
oVT100 0x4002 VT100 Terminal Write
oLeds 0x4004 LED outputs
oTimerCtrl 0x4006 Timer control
oMemDout 0x4008 Memory Data Output
oMemControl 0x400A Memory Control / Hi Address
oMemAddrLow 0x400C Memory Lo Address
o7SegLED 0x400E 4 x LED 7 Segment display
oIrcMask 0x4010 CPU Interrupt Mask
oUartBaudTx 0x4012 UART Tx Baud Clock Setting
oUartBaudRx 0x4014 UART Rx Baud Clock Setting

The input registers:

Register Address Description
iUart 0x4000 UART register
iVT100 0x4002 Terminal status & PS/2 Keyboard
iSwitches 0x4004 Buttons and switches
iTimerDin 0x4006 Current Timer Value
iMemDin 0x4008 Memory Data Input

The following description of the registers should be read in order and describe how the peripherals work as well.

oUart

A UART with a fixed baud rate and format (115200, 8 bits, 1 stop bit) is present on the SoC. The UART has a FIFO of depth 8 on both the RX and TX channels. The control of the UART is split across oUart and iUart.

To write a value to the UART assert TXWE along with putting the data in TXDO. The FIFO state can be analyzed by looking at the iUart register.

To read a value from the UART: iUart can be checked to see if data is present in the FIFO, if it is assert RXRE in the oUart register, on the next clock cycle the data will be present in the iUart register.

The baud rate of the UART can be changed by rebuilding the VHDL project, bit length, parity bits and stop bits can only be changed with modifications to uart.vhd

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|  X |  X |TXWE|  X |  X |RXRE|  X |  X |               TXDO                    |
+-------------------------------------------------------------------------------+

TXWE: UART TX Write Enable
RXRE: UART RX Read Enable
TXDO: UART TX Data Output

oVT100

The VGA Text device emulates a terminal which the user can talk to by writing to the oVT100 register. It supports a subset of the VT100 terminal functionality. The interface behaves much like writing to a UART with the same busy and control signals. The input is taken from a PS/2 keyboard available on the board, this behaves like the RX mechanism of the UART.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|  X |  X |TXWE|  X |  X |RXRE|  X |  X |               TXDO                    |
+-------------------------------------------------------------------------------+

TXWE: VT100 TX Write Enable
RXRE: UART RX Read Enable
TXDO: UART TX Data Output

oLeds

On the Nexys3 board there is a bank of LEDs that are situated next to the switches, these LEDs can be turned on (1) or off (0) by writing to LEDO. Each LED here corresponds to the switch it is next to.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|  X |  X |  X |  X |  X |  X |  X |  X |              LEDO                     |
+-------------------------------------------------------------------------------+

LEDO: LED Output

oTimerCtrl

The timer is controllable by the oTimerCtrl register, it is a 13-bit timer running at 100MHz, it can optionally generate interrupts and the current timers internal count can be read back in with the iTimerDin register.

The timer counts once the TE bit is asserted, once the timer reaches TCMP value it wraps around and can optionally generate an interrupt by asserting INTE. This also toggles the Q and NQ lines that come out of the timer and are routed to pins on the board (see the constraints file top.ucf for the pins).

The timer can be reset by writing to RST.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
| TE | RST|INTE|                      TCMP                                      |
+-------------------------------------------------------------------------------+

TE:   Timer Enable
RST:  Timer Reset
INTE: Interrupt Enable
TCMP: Timer Compare Value

oIrcMask

The H2 core has a mechanism for interrupts, interrupts have to be enabled or disabled with an instruction. Each interrupt can be masked off with a bit in IMSK to enable that specific interrupt. A '1' in a bit of IMSK enables that specific interrupt, which will be delivered to the CPU if interrupts are enabled within it.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|  X |  X |  X |  X |  X |  X |  X |  X |                 IMSK                  |
+-------------------------------------------------------------------------------+

IMSK: Interrupt Mask

oUartBaudTx

This register is used to set the baud and sample clock frequency for transmission only.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|                                    BTXC                                       |
+-------------------------------------------------------------------------------+

BTXC: Baud Clock Settings

oUartBaudRx

This register is used to set the baud and sample clock frequency for reception only.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|                                    BRXC                                       |
+-------------------------------------------------------------------------------+

BRXC: Baud Clock Settings

oMemDout

Data to be output to selected address when write enable (WE) issued in oMemControl.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|                           Data Ouput                                          |
+-------------------------------------------------------------------------------+

oMemControl

This register contains the control registers for the onboard memory on the Nexys3 board. The board contains three memory devices, two non-volatile memory devices and a volatile RAM based device. The two devices accessible by a simple SRAM interface (one volatile M45W8MW16, one non-volatile - a NP8P128A13T1760E) are both accessible, the third is an SPI based memory device, NP5Q128A13ESFC0E) and is currently not accessible.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
| OE | WE | RST|WAIT| RCS| FCS|                 Address Hi                      |
+-------------------------------------------------------------------------------+

OE:  Output Enable - enable reading from current address into iMemDin
WE:  Write Enable  - enable writing oMemDout into ram at current address
RST: Reset the Flash memory controller
RCS: RAM Chip Select, Enable Volatile Memory
FCS: Flash Chip Select, Enable Non-Volatile Memory
Address Hi: High Bits of RAM address

OE and WE are mutually exclusive, if both are set then there is no effect.

The memory controller is in active development, and the interface to it might change.

oMemAddrLow

This is the lower address bits of the RAM.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|                           Address Lo                                          |
+-------------------------------------------------------------------------------+

o7SegLED

On the Nexys3 board there is a bank of 7 segment displays, with a decimal point (8-segment really), which can be used for numeric output. The LED segments cannot be directly addressed. Instead the value stored in L8SD is mapped to a hexadecimal display value (or a BCD value, but this requires regeneration of the SoC and modification of a generic in the VHDL).

The value '0' corresponds to a zero displayed on the LED segment, '15' to an 'F', etcetera.

There are 4 displays in a row.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|      L7SD0        |       L7SD1       |       L7SD2       |       L7SD3       |
+-------------------------------------------------------------------------------+

L7SD0: LED 7 Segment Display (leftmost display)
L7SD1: LED 7 Segment Display
L7SD2: LED 7 Segment Display
L7SD3: LED 7 Segment Display (right most display)

iUart

The iUart register works in conjunction with the oUart register. The status of the FIFO that buffers both transmission and reception of bytes is available in the iUart register, as well as any received bytes.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|  X |  X |  X |TFFL|TFEM|  X |RFFL|RFEM|                RXDI                   |
+-------------------------------------------------------------------------------+

TFFL: UART TX FIFO Full
TFEM: UART TX FIFO Empty
RFFL: UART RX FIFO Full
RFEM: UART RX FIFO Empty
RXDI: UART RX Data Input

iVT100

The iVT100 register works in conjunction with the oVT100 register. The status of the FIFO that buffers both transmission and reception of bytes is available in the iVT100 register, as well as any received bytes. It works the same as the iUart/oUart registers.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|  X |  X |  X |TFFL|TFEM|  X |RFFL|RFEM|  0 |           ACHR                   |
+-------------------------------------------------------------------------------+

TFFL: VGA VT100 TX FIFO Full
TFEM: VGA VT100 TX FIFO Empty
RFFL: PS2 VT100 RX FIFO Full
RFEM: PS2 VT100 RX FIFO Empty
ACHR: New character available on PS2 Keyboard

iTimerDin

This register contains the current value of the timers counter.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|  X |  X |  X |                       TCNT                                     |
+-------------------------------------------------------------------------------+

TCNT: Timer Counter Value

iSwitches

iSwitches contains input lines from multiple sources. The buttons (BUP, BDWN, BLFT, BRGH, and BCNT) correspond to a D-Pad on the Nexys3 board. The switches (TSWI) are the ones mentioned in oLeds, each have an LED next to them.

The switches and the buttons are already debounced in hardware so they do not have to be further processed once read in from these registers.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|  X |  X |  X | BUP|BDWN|BLFT|BRGH|BCNT|               TSWI                    |
+-------------------------------------------------------------------------------+

BUP:  Button Up
BDWN: Button Down
BLFT: Button Left
BRGH: Button Right
BCNT: Button Center
TSWI: Two Position Switches

iMemDin

Memory input, either from the SRAM or Flash, indexed by oMemControl and oMemAddrLow. When reading from flash this might actually be status information or information from the query table.

+-------------------------------------------------------------------------------+
| 15 | 14 | 13 | 12 | 11 | 10 |  9 |  8 |  7 |  6 |  5 |  4 |  3 |  2 |  1 |  0 |
+-------------------------------------------------------------------------------+
|                           Data Input                                          |
+-------------------------------------------------------------------------------+

Interrupt Service Routines

The following interrupt service routines are defined:

Name Number Description
isrNone 0 Not used
isrRxFifoNotEmpty 1 UART RX FIFO Is Not Empty
isrRxFifoFull 2 UART RX FIFI Is Full
isrTxFifoNotEmpty 3 UART TX FIFO Is Not Empty
isrTxFifoFull 4 UART TX FIFO Is Full
isrKbdNew 5 New PS/2 Keyboard Character
isrTimer 6 Timer Counter
isrDPadButton 7 Any D-Pad Button Change State

When an interrupt occurs, and interrupts are enabled within the processor, then a call to the location in memory is performed - the location is the same as the ISR number. An ISR with a number of '4' will perform a call (not a jump) to the location '4' within memory, for example.

Interrupts have a latency of at least 4-5 cycles before they are acted on, there is a two to three cycle delay in the interrupt request handler, then the call to the ISR location in memory has to be done, then the call to the word that implements the ISR itself.

If two interrupts occur at the same time they are processed from the lowest interrupt number to the highest.

Interrupts are lost when an interrupt with the same number occurs that has not been processed.

The Toolchain

The Disassembler and C based simulator for the H2 is in a single program (see h2.c). This simulator complements the VHDL test bench tb.vhd and is not a replacement for it. The meta-compiler runs on top of an eForth interpreter and it contained within the files embed.c and embed.blk. The meta-compiler (Forth parlance for a cross-compiler) is a Forth program which is used to create the eForth image that runs on the target.

The toolchain is currently in flux, going forward there is liable to more integration between h2.c and embed.c, along with changing the Embed Virtual Machine into one that more closely resembles the H2 CPU with the long term goal of creating a self hosting system.

To build both, a C compiler is needed, the build target "h2" will build the executable, h2, and "embed" will build the meta-compiler:

make h2 embed

And it can be run on the source file embed.fth with the make target:

make run

The make file is not needed:

Linux:

cc -std=c99 h2.c -o h2        # To build the h2 executable
cc -std=c99 embed.c -o embed  # To build the embed VM executable
./embed embed.blk embed.hex embed.fth # Create the target eForth image
./h2 -h                     # For a list of options
./h2 -r embed.hex           # Run the assembled file

Windows:

gcc -std=c99 h2.c -o h2.exe       # Builds the h2.exe executable
gcc -std=c99 embed.c -o embed.exe # Builds the embed.exe executable
embed.exe embed.blk embed.hex embed.fth # Create the target eForth iamge
h2.exe -h                   # For a list of options
h2.exe -r embed.hex         # Run the assembled file

A list of command line options available:

    -       stop processing options, following arguments are files
    -h      print a help message and exit
    -v      increase logging level
    -d      disassemble input files (default)
    -D      full disassembly of input files
    -T      Enter debug mode when running simulation
    -r      run hex file
    -L #    load symbol file
    -s #    number of steps to run simulation (0 = forever)
-n #    specify NVRAM block file (default is nvram.blk)
    file*   file to process

This program is released under the MIT license, feel free to use it and modify it as you please. With minimal modification it should be able to assemble programs for the original J1 core.

Meta-Compiler

The meta-compiler runs on top of the embed virtual machine, it is a 16-bit virtual machine that originally descended from the H2 CPU. The project includes a meta-compilation scheme that allows an eForth image to generate a new eForth image with modifications. That system has been adapted for use with the H2, which replaced the cross compiler written in C, which allowed the first image for the H2 to be created.

The meta-compiler is an ordinary Forth program, it is contained within embed.fth. The meta-compiler Forth program is then used to build up an eForth image capable of running on the H2 target.

For more information about meta-compilation in Forth, see:

Disassembler

The disassembler takes a text file containing the assembled program, which consists of 16-bit hexadecimal numbers. It then attempts to disassemble the instructions. It can also be fed a symbols file which can be generated by the assembler and attempt to find the locations jumps and calls point to.

The disassembler is used by a tcl script called by GTKwave, it turns the instruction trace of the H2 from a series of numbers into the instructions and branch destinations that they represent. This makes debugging the VHDL much easier.

H2 Disassembly Results

The purple trace shows the disassembled instructions.

Simulator

The simulator in C implements the H2 core and most of the SoC. The IO for the simulator is not cycle accurate, but can be used for running and debugging programs with results that are very similar to how the hardware behaves. This is much faster than rebuilding the bit file used to flash the FPGA.

Debugger

The simulator also includes a debugger, which is designed to be similar to the DEBUG.COM program available in DOS. The debugger can be used to disassemble sections of memory, inspect the status of the peripherals and dump sections of memory to the screen. It can also be used to set breakpoints, single step and run through the code until a breakpoint is hit.

To run the debugger either a hex file or a source file must be given:

# -T turns debugging mode on
./h2 -T -r file.hex  # Run simulator

Both modes of operation can be augmented with a symbols file, which lists where variables, labels and functions are located with the assembled core.

When the "-T" option is given debug mode will be entered before the simulation is executed. A prompt should appear and the command line should look like this:

$ ./h2 -T -R h2.fth
Debugger running, type 'h' for a list of command
debug>

Break points can be set either symbolically or by program location, the 'b' command is used to set breakpoints:

Numbers can be entered in octal (prefix the number with '0'), hexadecimal (prefix with '0x') or in decimal. As an example, the following three debug commands all set a breakpoint at the same location:

debug> b 16
debug> b 0x10
debug> b 020

'k' can be used to list the current break points that are set:

debug> k
    0x0010

This sets a breakpoint when the function "key?" is called:

debug> b key?

Functions and labels can both be halted on, this requires either a symbols file to be specified on the command line or assemble and run to be used on a source file, not a hex file. Symbol files can be used on source or on hex files.

To single step the 's' command can be given, although not much will happen if tracing is turned off (tracing is off by default). Tracing can be toggled on or off with the 't' command:

debug> s
debug> s
debug> t
trace on
debug> s
0001: pc(089a) inst(4889) sp(0) rp(0) tos(0000) r(0000) call 889 init
debug> s
0002: pc(0889) inst(807a) sp(0) rp(1) tos(0000) r(089b) 7a
debug> s
0003: pc(088a) inst(e004) sp(1) rp(1) tos(007a) r(089b) 6004

It is advisable to turn tracing off when running issuing the 'c', or continue, command.

The '.' command can be used to display the H2 cores internal state:

debug> .
Return Stack:
0000: 0000 08aa 0883 017b 0000 031b 0000 ffb0 0000 02eb ffb5 0210 0167 0167
0167 0167
0010: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000

Variable Stack:
tos:  0000
0001: 0000 0000 0000 0001 0004 0005 0000 ffb0 0000 0000 0000 0000 0000 0000
0000 0000
0011: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000

pc:   0538
rp:   0001
dp:   0000
ie:   false

And the 'p' command can be used to display the state of the simulated peripherals:

debug> p
LEDS:          00
VGA Cursor:    0005
VGA Control:   007a
Timer Control: 8032
Timer:         001b
IRC Mask:      0000
UART Input:    6c
LED 7seg:      0005
Switches:      00
LFSR:          40ba
Waiting:       false

For a complete list of commands, use the 'h' command.

Other ways to enter debug mode include putting the ".break" assembler directive into the source code (this only works if the assemble and run command is used on source files, not on hex files), and hitting the escape character when the simulator is trying to read data via the simulated UART or PS/2 keyboard (the escape will still be passed onto the simulator, but it also activates debug mode).

Graphical simulator

A separate program can be compiled, tested under Linux and Windows. This simulates the Nexys3 board peripherals that the SoC interfaces with, but provides a graphical environment, unlike the command line utility. It is easier to interact with the device and see what it is doing, but the debugging sessions are a less controlled. It requires free glut.

Below is an image of a running session in the GUI simulator:

H2 GUI Simulator

Building can be done with

make gui

And running:

make gui-run

Or:

./gui   h2.hex (on Linux)
gui.exe h2.hex (on Windows)

The Linux build should work when the development package for free glut is installed on your system, the Windows build may require changes to the build system and/or manual installation of the compiler, libraries and headers.

The current key map is:

Up         Activate Up D-Pad Button, Release turns off
Down       Activate Down D-Pad Button, Release turns off
Left       Activate Left D-Pad Button, Release turns off
Right      Activate Right D-Pad Button, Release turns off
F1 - F8    Toggle Switch On/Off, F1 is left most, F8 Right Most
F11        Toggle UART/PS2 Keyboard Input
F12        Toggle Debugging Information
Escape     Quit simulator

All other keyboard keys are redirected to the UART or PS/2 Keyboard input.

The Switches and D-Pad buttons can be clicked on to turn them on, the switches turn on with left clicks and off with right clicks. The D-Pads buttons turn on with a click on top of them and turn off with a key release anywhere on the screen.

VHDL Components

The VHDL components used in this system are designed to be reusable and portable across different toolchains and vendors. Hardware components, like block RAM, are inferred and not explicitly instantiated. The components are also made to be as generic as possible, with most having selectable widths. This would be taken to the extreme, but unfortunately many vendors still do not support the VHDL-2008 standard.

File License Author Description
util.vhd MIT Richard J Howe A collection of generic components
h2.vhd MIT Richard J Howe H2 Forth CPU Core
uart.vhd MIT Richard J Howe UART TX/RX (Run time customizable)
vga.vhd LGPL 3.0 Javier V García Text Mode VGA 80x40 Display
Richard J Howe (and VT100 terminal emulator)
kbd.vhd ??? Scott Larson PS/2 Keyboard

eForth on the H2

The pseudo Forth like language used as an assembler is described above, the application that actually runs on the Forth core is in itself a Forth interpreter. This section describes the Forth interpreter that runs on H2 Core, it is contained within embed.fth.

TODO:

Coding standards

There are several languages used throughout this project, all of which are radically different from each other and require their own set of coding standards and style guides.

VHDL

Common signal names:

clk       - The system clock
rst       - A reset signal for the module
we        - Write Enable
re        - Read  Enable
di        - Data  In
din       - Data  In
do        - Data  Out
dout      - Data  Out
control   - Generally an input to a register, the documentation
            for the module will need to be consulted to find out
            what each bit means
signal_we - The write enable for 'signal'
signal_i  - This is an input signal
signal_o  - This is an output signal

Generally the use of the "_i" and "_o" suffixes are not used, modules are kept short and names chosen so their meaning is obvious. This rule might be revisited once the project grows.

Components should:

constant N: positive := 4;
signal a: std_logic_vector(N - 1 downto 0) := (others => '1');

Instead of:

signal a: std_logic_vector(3 downto 0) := x"F";

The style rules are as follows:

An example of the formatting guidelines, this describes a simple arbitrary width register:

-- Lots of comments about what the unit does should go
-- here. Describe the waveforms, states and use ASCII
-- art where possible.
library ieee, work;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;    -- numeric_std not std_logic_arith

entity reg is -- generic and port indented one tab, their parameters two
    generic (
        N: positive); -- Generic parameters make for a generic component
    port (
        clk: in  std_logic; -- standard signal names
        rst: in  std_logic; --
        we:  in  std_logic;
        di:  in  std_logic_vector(N - 1 downto 0);
        do:  out std_logic_vector(N - 1 downto 0)); -- note the position of ");
end entity; -- "end entity", not "end reg"

architecture rtl of reg is
    signal r_c, r_n: std_logic_vector(N - 1 downto 0) := (others => '0');
begin
    do <= r_c;

    process(rst, clk)
    begin
        if rst = '1' then -- asynchronous reset
            r_c <= (others => '0');
        elsif rising_edge(clk) then -- rising edge, not "clk'event and clk = '1'"
            r_c <= r_n;
        end if;
    end process;

    process(r_c, di, we)
    begin
        r_n <= r_c;
        if we = '1' then
            r_n <= di;
        end if;
    end process;
end; -- "end" or "end architecture"

C

There is quite a lot of C code used within this project, used to make a tool chain for the H2 core and to simulate the system.

There is nothing too surprising about the C code within here, so some of the exceptions should be dealt with.

static const char *alu_op_to_string(uint16_t instruction) {
    /* notice also that the 'case' clauses are inline with the
     * switch selector */
    switch (ALU_OP(instruction)) {
    case ALU_OP_T:                  return "T";
    case ALU_OP_N:                  return "N";
    case ALU_OP_T_PLUS_N:           return "T+N";
    case ALU_OP_T_AND_N:            return "T&N";
    case ALU_OP_T_OR_N:             return "T|N";
    case ALU_OP_T_XOR_N:            return "T^N";
    case ALU_OP_T_INVERT:           return "~T";
    case ALU_OP_T_EQUAL_N:          return "N=T";
    case ALU_OP_N_LESS_T:           return "T>N";
    case ALU_OP_N_RSHIFT_T:         return "N>>T";
    case ALU_OP_T_DECREMENT:        return "T-1";
    case ALU_OP_R:                  return "R";
    case ALU_OP_T_LOAD:             return "[T]";
    case ALU_OP_N_LSHIFT_T:         return "N<<T";
    case ALU_OP_DEPTH:              return "depth";
    case ALU_OP_N_ULESS_T:          return "Tu>N";
    case ALU_OP_ENABLE_INTERRUPTS:  return "seti";
    case ALU_OP_INTERRUPTS_ENABLED: return "iset?";
    case ALU_OP_RDEPTH:             return "rdepth";
    case ALU_OP_T_EQUAL_0:          return "0=";
    case ALU_OP_CPU_ID:             return "cpu-id";
    default:                        return "unknown";
    }
}
if (foo)
    bar();
else
    baz();

To Do

picocom --omap delbs -b 115200 -e b /dev/ttyUSB1

Resources