gsmecher / minimax

Minimax: a Compressed-First, Microcoded RISC-V CPU
BSD 3-Clause "New" or "Revised" License
204 stars 13 forks source link

Consider taping out your CPU for free using Google's open MPW program #2

Open mithro opened 2 years ago

mithro commented 2 years ago

It would be awesome to see how this CPU would work in ASIC form and Google is offering free tape outs to open source silicon projects on SkyWater's 130nm and GlobalFoundries 180nm MCU process technologies. You can even use an entirely open source toolchain!

I feel like this MCU could be an ideal fit for the GF180MCU process technology which people currently use for low cost 8bit MCU designs.

Find out more at https://developers.google.com/silicon and https://efablesa.com and https://opensource.googleblog.com/2022/08/GlobalFoundries-joins-Googles-open-source-silicon-initiative.html

You can also join the https://open-source-silicon.dev slack workspace to find other people doing cool things in the open source ASIC space.

dumblob commented 2 years ago

@bunnie this whole take on a compressed instructions first RISC-V CPU might be of interest to you, https://betrusted.io/ , Xous, and maybe even Precursor.

gsmecher commented 2 years ago

Unfortunately (and not unpredictably), it looks like the use of VHDL-2008 specific constructs is a hassle here. GHDL can't "synthesize" (really, translate to Verilog) without modifications.

dumblob commented 2 years ago

Would it be too much work to switch to VHDL-2019 to allow for special-casing these problematic constructs (i.e. macro-like preprocessing)?

There seems to be even a standalone Python app adding this preprocessing functionality to any older VHDL toolchains: https://github.com/nobodywasishere/VHDLproc .

gsmecher commented 2 years ago

VHDL-2019 would be a step in the wrong direction - honestly, I think the easiest solution is a hand-translated Verilog implementation. (It's not that much code, after all.)

dumblob commented 2 years ago

Sounds encouraging when the author - you - says that :wink:. Do you plan to rewrite it at some point?

gsmecher commented 2 years ago

@mithro, I don't have the cycles right now to meet the upcoming GF180MCU MPW deadline. I am very interested in getting Minimax fab'd when the time is right. I'll leave this issue open for now - free wafer space for open-source hardware is extraordinarily generous and I hate to let it go.

mithro commented 2 years ago

@gsmecher - Totally understand.

I would still encourage you to try to submit something as soon possible as increasing demand for open silicon program is the primary driver around continuing the program (IE It's a use it or lose it style thing.)

dumblob commented 2 years ago

I might know of some people who could have the cycles to rewrite this to Verilog or do the necessary patching for a first tape out (or at least first submission to try the process as @mithro suggested and see what the feedback from professionals doing the inspection before tape out will look like).

@gsmecher do you think someone new to the code base would pick it up quickly (more or less) without your guidance (deep involvement) and could push it further?

gsmecher commented 2 years ago

@dumblob - if you (or your colleagues) have spare cycles, I'd love some help with the ASIC flow. I haven't found any pre-built dual-ported RAM primitives. Looks like I can generate my own using OpenRAM, but I don't know where this lands on the normal/experimental spectrum.

I have been using the following primitives in FPGA-land:

It's easy to move these goalposts slightly: for example, if mixed-aspect-ratio RAMs are hard to generate we can move to a 32-bit ports and add multiplexers to select high or low halves for port B. Similarly, the read latencies are easy to unify.

mithro commented 1 year ago

@xobs - Maybe you can help here?

xobs commented 1 year ago

I've started porting it to Verilog: https://github.com/xobs/minimax/tree/verilog-rewrite

I'm not very familiar with VHDL, so it's all new to me. But perhaps we can try to run the testbench on the Verilog code once it's finished.

As for memory primitives, you've got a few. For SKY130, there are some 2 kB 1rw1r memories you can use: https://github.com/efabless/sky130_sram_macros. For GF180 you've got 512 byte memories that can be strung together: https://github.com/google/globalfoundries-pdk-ip-gf180mcu_fd_ip_sram/tree/9c411928870ce15226228fa52ddb6ecc0ea4ffbe/cells

gsmecher commented 1 year ago

@xobs, @mithro - it's not the Verilog rewrite that's the stumbling block (though I am happy to see it - the closer the two codebases are, the happier I am to maintain them in parallel.)

Here's where I look a gift horse in the mouth. I am just not sure that GF180MCU is the right target without dual-port RAM macros. The current code requires the following memories:

Two stumbling blocks:

BTW, expect a minor hassle translating the or-combining idiom used in VHDL'08 to Verilog. For example:

aluB <= regS
        or ((std_logic_vector'(31 downto 5 => inst(12)) & inst(6 downto 2)) and (op16_ADDI or op16_ANDI or op16_LI))
        or ((std_logic_vector'(31 downto 17 => inst(12)) & inst(6 downto 2) & 12x"0") and op16_LUI)
        or ((std_logic_vector'(31 downto 9 => inst(12)) & inst(4 downto 3) & inst(5) & inst(2) & inst(6) & x"0") and op16_ADDI16SP);

In VHDL: the "and" operators here will happily gate each data vector based on the opcode pattern match. In Verilog, I think you need to ensure every signal involved is signed in order to get the same behaviour.

ed: hm... and I see I've been assuming GF180MCU was the only possible target here, and I am just wrong.

mithro commented 1 year ago

Some important dates;

The plan is to have multiple more runs in 2023. The number of runs depends on how much demand there is this year. The best way to make sure that these programs continue is to use them!

The world of SRAMs in ASIC process technologies is very different from FPGAs where they are pretty plentiful and free!

The SKY130 process technology does have a good 1.5 years head start on GF180MCU :-)

For what it's worth, the SKY130 PDK launched with no SRAM available at all. The OpenRAM project has been working on SRAMs for SKY130 and provided the above versions but there is still a lot of work left to validate how well they work and what their actual performance is. It is recommended on SKY130 that designs used a tool called DFFRAM (see https://github.com/AUCOHL/DFFRAM) which creates a highly compact array of latches which can be used for memory purposes.

The OpenRAM project is working on creating new SRAMs for GF180MCU but have not finished that yet.

Given how small your design is, you might just be able to get away with using flip flops rather than needing to use memory. I believe that is what @olofk currently does with his SERV core?

dumblob commented 1 year ago

@dodotronix someone from your circles, colleagues etc. could chime in with their experience and thoughts.

xobs commented 1 year ago

I admit that I didn't realize GHDL had come such a long way. I had it generate Verilog, then reworked it into something readable. I made sure to do equivalence cchecking using Yosys to make sure the underlying logic doesn't change.

The result is minimax.v.

In doing the rewrite I like to think that I understand the core a bit better, though I'll reserve that statement until I get it working under simulation.

Note that I'm currently trying to understand blink.S and get it running under simulation. I'm noticing an issue where the program counter keeps running even if it's bubbling, which causes it to trap immediately (since bubble1 and bubble2 are both set on reset, but the program counter only pauses if it's in the middle of a lwsp/lw due to rreq: https://github.com/gsmecher/minimax/blob/main/rtl/minimax.vhd#L201-L207)

xobs commented 1 year ago

I managed to get all of the test benches passing in Verilog, and I've wired it up to CI: https://github.com/xobs/minimax/tree/verilog-port

I'm going to see what it would take to harden this design.

gsmecher commented 1 year ago

@xobs - that's amazing. Are you working towards a PR here?

Either way - you have cleared the hurdle and merging your work is a clear net benefit. I am eager to get this merged in whenever it makes the most sense.

gsmecher commented 1 year ago

Also: two questions about the equivalence checking you've been using.

It's clear that Verilog has longer legs than VHDL for this kind of work, and I'm wondering whether to shift over wholesale or to maintain both.

xobs commented 1 year ago

I'm still working on hooking up memory, but here's a preview of what I've got so far. This is synthesized using SKY130.

image

image

This particular design has 512 bytes of inferred FF RAM, along with the 256 bytes of inferred register file RAM. I'm sure that's where most of the design space is going, so using hard macros will help here.

xobs commented 1 year ago

I was using yosys for equivalence checking, however that stopped working as soon as I made the change to use assign statements rather than always @* that was generated by GHDL.

My process was:

  1. Run ghdl.exe --synth --out=verilog --std=08 .\minimax.vhd -e minimax > base-minimax.v
  2. Copy base-minimax.v to minimax.v
  3. Un-tangle the resulting logic, running an equivalence heck between minimax.v and base-minimax.v until it was readable
  4. Try to run it in iverilog and find that it never even starts
  5. Rewrite always @* statements to use assign
  6. Stare at it a lot
  7. Port the testbench by hand
  8. Implement tracing
  9. Compare the output of the two tracing testbenches

Equivalence checking stopped working after step 5, and I still don't have it working. I highly suspect it's because I manually implemented the equivalent of (PC_BITS-1 downto 1) whereas the converted VHDL code effectively does it as [PC_BITS-2:0]. They're equivalent in the end, but strictly speaking it's different and that throws everything off. In the end I'm not sure if that's necessarily a bad thing, since the test cases actually pass.

gsmecher commented 1 year ago

Fantastic - I am looking forward to seeing the macro-based design crashland. (I am assuming it's most useful to stay out of your way here.)

For the time being, VHDL and Verilog are equivalent and I'll leave both in place. When co-maintaining them becomes a hassle, I expect it'll make sense to drop the VHDL version.

xobs commented 1 year ago

I thought you'd like to see this -- I'm working on a GF180MCU tapeout of this part. It works in simulation, so now I'm trying to harden the design.

I've given it 6 kilobytes of RAM. We don't have 2r1w memories in GF180, so I hardened them separately.

Here's what minimax looks like when hardened:

image

RAM is on the left, the two register files (microcode and user code) are on the right. The itty bitty blob in the middle is minimax.

gsmecher commented 1 year ago

@xobs, this makes me deliriously happy. Two drive-by questions:

I wanted to point out issue #5 - along with instruction support, I see a huge opportunity for Minimax from the Zc* efforts.

olofk commented 1 year ago

Very cool! @gsmecher. What I did for Subservient (the SERV-based mini SoC for ASICs) is to add a Wishbone port for loading data into memory. That does cost an insane amount of pins though, which made SERV a lot larger so I've been looking at some serial version instead. Either JTAG or something even simpler

xobs commented 1 year ago

For (1), I was considering doing a Wishbone port like @olofk suggested. However, an alternate approach could be to wire Wishbone such that it jams "load" and then "store" instructions into the CPU. This is similar to how debug works on Vexriscv. It may be possible to do this in the topfile, particularly since I pulled out the regfiles in order to harden them as external macros.

For (2), sure! Note that the image is slightly a lie because I'm still having to shuffle everything around and tweak the routing density values to actually get it to fit. And openroad keeps segfaulting on me during detailed placement. So it probably won't be placed as nicely. But it does give a fantastic representation of just how tiny it is. Well done!

dumblob commented 1 year ago

Just saw GSOC ideas regarding chips and thought there might be (or perhaps not) an option to get some students to make their hands dirty with minimax or related: https://github.com/chipsalliance/ideas .