Your Documentation is Gorgeous

PythonLinks commented 1 year ago

The best I have seen in the J1 community. It is very helpful to newbies like me.

I assume that your hardware is done with the same meticulous attention to detail.

A couple of points. You have 5 bits for instruction selection. The J1 has 4. Wait a minute. Where did that extra 5th bit come from? I can read the J1 docs and source and figure it out, but it would be good to know up front what the difference is from the J1.

And since you have 5 bits, you could have 32 instructions, why did you stop at 21?

Secondly, people are flooding into Verilog, but you are using VHDL. I suspect that there are good reasons for it. It would be great to read a “Defemse Of VHDL.” Why do you prefer it to Verilog?

The J1 has the stacks in registers and single clock cycle access to memory. I think that you take two clock cycles to access memory. The reasons for that difference with the J1 would also be most interesting to read.

And it would be very helpful if you would state the clock speed, and resources consumed. Lots of data points paint a picture. The MeCrisp Ice, the HX8K version was very helpful in that they stated that they had a multiplier, but it brought the speed down to 36Mhz. Such data points are hugely useful to people like me trying to figure out which way is up. It helped me understand that the J1 is so fast, because it has almost no math functions. Remove the plus function, and it would be even faster!

I see that you have made a lot of progress since I last looked at this many years ago. Good work.

The C graphical simulator is also unique and most important.

And I love how your cpu tightly integrates into the board.

Projects like this are so important. There is not just one J1 processor, there are a family of processors, validating the concept.

howerj commented 1 year ago

Thanks for kind words,

There was an extra unused bit in the j1, probably left used for something (such a stack effect) that I decided I'd use for extra ALU instructions, it did mean I had to shift a few things around in the instruction set. The reason I did not use all 32 slots for instructions is three fold;

1) I simply did not need to use the entire range for what I wanted. 2) To allow others to expand the system if they needed to with special instructions suited to their purpose. 3) It would have slowed down the processor and taken up more FPGA floor space for no gain (no gain for me, and what I wanted to achieve).

The few instructions I did add were added because they are needed for the new functionality (such as interrupts, which the original lacked) or because they were easy (in terms of speed and size costs), mainly because the CPU already had to do the functionality (such as comparing to zero, needed for jumping, which could then be easily added as an instruction).

I am not the best person to give a guide on VHDL vs Verilog (or SystemVerilog). I could offer some reasons, but no strong convictions lay behind them. Here are some:

1) Most importantly I prefer it, I like the stronger type checking. 2) It allows you to make more generic components (I believe this a valid criticism against Verilog, and not SystemVerilog). 3) I learned VHDL at University, and not Verilog, I believe VHDL is more popular in Europe than in the USA, but my information might be both incorrect and out of date (I started this project in 2013). As I knew VHDL first, the question for me would become, why Verilog? I think even then Verilog was taking over VHDL though.

Memory can be accessed (read) in a single cycle on both this system and the J1, there is no difference here (I believe). I would be interested to know why you think this is not the case? Storing is split into two instructions, as far as I am aware the original behaved in the same way.

The clock speed is stated in the commit message for each message, for example the latest commit, e3e8a69fa1bae49e40a48456596191b06b4af94d, has the prefix "110.358MHz/1A0C:", which is the maximum clock frequency that the system will run at along with the size of the eForth image in hexadecimal that is running on the system. Unfortunately I did not keep track of the resources consumed by the project (on other VHDL projects, such as my bit-serial CPU project, I also put the number of LUTs used in the commit message, I did not do that on this project). I currently do not have the tool-chain setup nor do I have a board to run the results on anymore, so the report is not easily available to me.

When adding new instructions and components to the system I had the speed target of 100MHz, as the board ran at that speed, I could not go below that target speed without risking the systems stability.

From memory (so this might be incorrect)

The SoC uses at least 3 BRAMs (one for the CPU, one for font storage, one for the VGA text buffer).
The CPU itself does not use any special resources (such as a multiplier), apart from some of the distributed memory to store the stacks in. It does not have a multiplier.

Thanks again for the questions!

The VGA subsystem does use a few multipliers.
I am unsure how many LUTs the entire system used.

If I get time I will update the repo with that information, but I will be unable to do it for some time.

I'm glad you managed to get the GUI working! (I assume you did). It is a bit basic and would have better been done in something like SDL or another graphics framework but it does the trick.

I used the system to explore the capabilities of the board as much as possible. I wish I had managed to get some of the networking working, doing that would have been interesting, alas I lack the time (and the board!). Nearly the entire system is written in portable VHDL, without reference to any vendor specific functionality (such as directly instantiating Block RAM, it is all inferred).

Thanks for the questions, I'll keep this issue open until you don't have any more. If you manage to do anything with it let me know.

PythonLinks commented 1 year ago

Thank you for the answers.

Most importantly I prefer it, I like the stronger type checking. Totally agreed. "Programming languages should be dynamically bound, hardware languages should be statically bound." C++ and Verilog have it backwards in my mind. Python and VHDL make more sense to me.

It allows you to make more generic components (I believe this a valid criticism against Verilog, and not SystemVerilog). Critical to know. The other day I was doing a 2 digit BCD counter (4 bits) in Verilog.
input [3:0]. count [1:0] Turns out I cannot pass an array into a module. WTF So more flexibility would be awesome.

What really interests me most are large number of forth cpus working together. Do you have any ideas for an application requiring large numbers of Forth processors?

I am doing a talk at the stockholm FPGA conference, " A review of Forth Processors". and will do a few slides about your processor, it is a very important part of the J1 community. It is the diversity in the community, which gives it credibility.

howerj commented 1 year ago

Unfortunately I don't have any ideas for applications requiring large numbers of Forth cores. It is possible to put quite a few of these processor on a single FPGA (most likely you would be limited to the number of Block RAMs on it before you ran out of other resources), but it is more of a solution in search of a problem. Problems which can easily be parallelized can usually be done best on a GPU, or directly in an FPGA.

An area in which they could be used (not a large number, maybe a few) is to control sections of an FPGA, or as a kind of I/O processor, having a Forth core dedicated to a single UART for example, or as an SPI controller. I did want to replace the VT100 terminal logic with another one of my projects (a bit-serial CPU design for minimal FPGA floor space), I could have used another H2 core instead (the VT100/VGA uses some multipliers, and quite a lot of LUTs, ideally the program would live in the Font Block RAM).

Thanks for that, I hope that someone can find the project useful.

PythonLinks commented 1 year ago

having a Forth core dedicated to a single UART for example, or as an SPI.

The XMOS chips do all of their bit banging in software. I think that the Forth on the Parallax Propeller does some of this also. Of course to start off, easier to find an ip core that does it. Although I still do not understand all of this I/O stuff.

I hope that someone can find the project useful If I go with VHDL, rather than Verilog, I will lean heavily on your work.
Anything which can reduce errors, such as static binding, is most useful for FPGA development. Also, if VHDL is more flexible that is a great benefit. And finally more verbose is more readable for those who follow us.

What interests me are application specific computers. Lots of Forth CPU's doing the high level control, some low level dedicated processors doing the heavy lifting. I will find a good application.

howerj commented 1 year ago

I'll close this issue then, thanks for the questions, if you have any others either open up another issue or send me an Email, cheers!

howerj / forth-cpu

Your Documentation is Gorgeous #8