ucb-art / rocket-dsp-utils

5 stars 0 forks source link

Identify specific memories to be implemented as SRAM cells #30

Open chick opened 2 years ago

chick commented 2 years ago

We would like to have some automatic way to annotate which Chisel SyncReadMems out of some number of them in our design should be implemented as SRAM cells. Probably this could be solved fairly quickly and I believe the key here is ToMemIR

Namely, we are regularly using a locally modified version of this transform to replace only memories above certain size (say, >256x32) with blackbox SRAM macros (highly valuable in our FFT accelerator). I was thinking if DefMem would have an additional parameter which would say "I'm a candidate for repl-seq-mem transform", then besides the conditions which check the port correctness it would also check that parameter and based on it place the memory in a list of annotated memories.

This would mean that SyncReadMem could potentially also need that same parameter which would be passed to DefMem which is what FIRRTL sees. It would require some changes in Chisel and FIRRTL, but it seems to me it could be possible. In my opinion doing such changes would take you guys a couple of days, and it would take us more than a month.

ekiwi commented 2 years ago

This should be solvable through an annotation. The question is: Should the annotation be positive or negative? I.e. include or exclude?

chick commented 2 years ago

@milovanovic Possibly relevant firrtl 2202, Rewrite vlsi_mem_gen into a Firrtl Transform

chick commented 2 years ago

@milovanovic Do you have shareable example code that illustrates what you are trying to do here. chisel group thinks the problems seems tractable but would like to make sure we get API and specifics correct.

milovanovic commented 2 years ago

As an illustrative example you can find a SyncReadMemAnnotationDemo which instantiates two ShiftRegisterMems each of which can be implemented using SRAM macros, but doesn't have to be. Our intention is to, say, implement one ShiftRegisterMem using SRAM, but leave the other ShiftRegisterMem to be implemented just through D flip-flops.

ekiwi commented 2 years ago

@milovanovic What program/script are you using to compile the memories after they have been blackboxed? Could you modify that script to generate a D flip-flop based memory instead of an SRAM for certain sizes?

milovanovic commented 2 years ago

If the memories are blackboxed in case of ASICs we usually replace them with macros obtained from an SRAM compiler, e.g., OpenRAM. We have a locally modified ToMemIR transform that blackboxes every memory if it is above certain size threshold and this is what we regularly do in our designs. However if we have memories of equal sizes some of which we would like to blackbox, the threshold trick doesn't work anymore. If we have many memories there has to be an explicit way to tell the tool which memories to blackbox and which ones not to touch. We don't have a script that makes a D flip-flop-based RAM like the DFFRAM (even though it would be awesome), but just if we don't blackbox the ShiftRegisterMem, it is implemented ordinary with plain registers. Whether it will be a positive or a negative annotation is of minor importance to us, as long as we have some automated way to do this. Thanks a lot on your support.

ekiwi commented 2 years ago

So the one thing I am a bit worried about is that when you do not replace SyncMems, then we firrtl will emit Verilog that simulates the behavior of a SyncMem. However, we do not want to make any guarantees that the we won't sometimes tweak the way that simulation memories are emitted. So the problem would be that this might at some point break your tools inference of LUT RAM. However, if you make your own "macro compiler" for LUT RAMs, then you wouldn't have to worry about that.

To solve your problem with deciding which memories to implement in which way, we could add a "technology hint" annotation which takes a string that is handed to your memory compiler which can use it to decide how to physically implement that memory.

milovanovic commented 2 years ago

Just a few thoughts. I guess you're pointing out that the best solution to this issue is to use a firrtl transformation to somehow generate the mem.conf file which, beside the standard ones, will contain an additional "technology hint" flag, that is:

name simple_shift_register2_ext depth 256 width 16 ports write,read + (mem sram or lutram)

A memory compiler should then parse those commands and appropriately replace the memories. Currently if we have two identical memories only one line will be generated inside the mem.conf file but by adding the mem parameter we assume that we are going to have:

name simple_shift_register1_ext depth 256 width 16 ports write,read mem sram name simple_shift_register2_ext depth 256 width 16 ports write,read mem lutram

It would also be worth thinking what is going to happen if we would have something weird in our design like:

name simple_shift_register2_ext depth 36 width 17 ports write,read

As you can assume, those are non-standard memory parameters and normally they are not supported by the memory compiler. We need to replace those kind of blackboxes as well. I reckon that the answer would be to use Mem or a vector of registers inside Chisel instead, or make it parameterizable so that for larger memories it can still be a SyncReadMem.

ekiwi commented 2 years ago

As you can assume, those are non-standard memory parameters and normally they are not supported by the memory compiler. We need to replace those kind of blackboxes as well. I reckon that the answer would be to use Mem or a vector of registers inside Chisel instead, or make it parameterizable so that for larger memories it can still be a SyncReadMem.

I feel like it shouldn't be too hard for us to provide a default script to generate Verilog that describes a LUT RAM.

milovanovic commented 2 years ago

I feel like it shouldn't be too hard for us to provide a default script to generate Verilog that describes a LUT RAM.

It would be ideal.

jwright6323 commented 2 years ago

I think one could simply implement this: https://github.com/ucb-bar/barstools/blob/cdd8af4153926f6bde583a4ebfbc87014cfee9ac/macros/src/main/scala/barstools/macros/MacroCompiler.scala#L715

and you'd have what you want (SRAM mapping with flop-based fallback based on some cost function)

ekiwi commented 2 years ago

I think one could simply implement this:

Would it be possible to actually integrate that into firrtl? There already is a new VLSIMemGen implementation written in Scala and I assume (without knowing anything about bar tools) that this is similar, no?

https://github.com/chipsalliance/firrtl/pull/2202

jwright6323 commented 2 years ago

MacroCompiler has been around a while, but it was originally written to have more functionality and automation than vlsi_mem_gen. If the new FIRRTL pass just replicates vlsi_mem_gen behavior there's still a case for MacroCompiler to exist, but I'm not sure it belongs in firrtl. I tend to agree with @colinschmidt in that thread, but @jackkoenig has a good point that ReplSeqMem without an FF-based SRAM implementation is unfriendly to the end user.

jwright6323 commented 2 years ago

On the other hand, it's nice to be able to use chisel/firrtl in a real ASIC flow without cloning a bunch of extra libraries (like barstools), and this is not handled elegantly without either barstools or vlsi_mem_gen, IMO, which may be off-putting to potential chisel adopters (e.g. my slack post).

ekiwi commented 2 years ago

On the other hand, it's nice to be able to use chisel/firrtl in a real ASIC flow without cloning a bunch of extra libraries (like barstools), and this is not handled elegantly without either barstools or vlsi_mem_gen, IMO, which may be off-putting to potential chisel adopters (e.g. my slack post).

The next chisel/firrtl release is going to include the Scala version of vlsi_mem_gen, do you think that would be enough to address the issue?

jwright6323 commented 2 years ago

The next chisel/firrtl release is going to include the Scala version of vlsi_mem_gen, do you think that would be enough to address the issue?

Probably. It's better than running into problems where the memory array is split into multiple reg definitions to handle write masking. From what I can tell, though, the user still has to write their own RAM wrappers.

There are also a few other issues that barstools/MacroCompiler don't handle that come up in real ASIC designs:

ekiwi commented 2 years ago
* allowing users to use different memory compilers (Vts, mux factors, etc.) for two otherwise equivalent SRAMs (e.g. if I have two 64x4096 1RW SRAMs and I want one to be fast and one to be low power, that's not easy to do because they'll get replaced with the same wrapper module)

We can solve that with the "technology hint" annotation that I suggested above.

connecting power control pins (powerdown, light sleep, etc.) to logic signals in the chisel design

I am not sure how we would do that in a technology independent way besides adding the superset of power control options and optimizing away the once that the technology does not need.

jwright6323 commented 2 years ago

I am not sure how we would do that in a technology independent way besides adding the superset of power control options and optimizing away the once that the technology does not need.

I don't think you need to explicitly define the functionality of pins and create a superset- just allowing arbitrary pin addition would be fine. For example, I think you could have an annotation that targets the Mem and a net in the design and includes a pin name, which a transform could connect using BoringUtils or equivalent.

jackkoenig commented 2 years ago

@jwright6323 what about integrating MacroCompiler into FIRRTL? Could improve the out-of-box experience quite a bit

jwright6323 commented 2 years ago

It's worth looking into, but there might be some cleanup/feature additions required. !ping @colinschmidt