stanford-ppl / spatial-lang

Spatial: "Specify Parameterized Accelerators Through Inordinately Abstract Language"
MIT License
99 stars 12 forks source link

Use URAMs on F1 #230

Open shadjis opened 6 years ago

shadjis commented 6 years ago

The F1 has UltraRAMs which can be used for larger SRAMs. However, SRAMs need to be explicitly assigned to URAMs using the following syntax:

(* ram_style = "ultra" *) reg [DWIDTH-1:0] mem [0:WORDS-1];

One way to do this is to have an analysis pass which:

  1. gets a list of all SRAMs,
  2. sorts the list by size, and
  3. keeps track of the largest 800 SRAMs (there are 800 URAMs on the F1)

E.g. this can be done by storing the size of the 800th largest SRAM and then in code generation using a different template for SRAMs larger than that.

dkoeplin commented 6 years ago

A single SRAM may take more than 1 URAM, but other than that yep this should work.

Are there any downsides to using a URAM over an SRAM (e.g. not dual ported, higher latency, etc.)?

raghup17 commented 6 years ago

Latency should be the same (1-cycle), and URAMs are dual ported. However, the width of URAM ports is twice the width of BRAMs (72 bits), and this cannot be configured to operate as a smaller width. In other words, we do not get more depth with URAM if we use a narrower width, unlike BRAMs. This means that without fixing #231 , URAM usage will be quite inefficient for narrower data types.

shadjis commented 6 years ago

Also David, the case of 1 SRAM being >1 URAM may be a bit complicated since URAMs can be cascaded to implement bigger URAMs. However, enabling cascading reduces the number of URAMs available: https://github.com/aws/aws-fpga/blob/master/hdk/cl/examples/cl_uram_example/README.md#implementation-options

If cascading is not enabled, and something > 4096 words is given a uram directive, I'm not sure if this will:

  1. still use multiple URAMs but non-dedicated routing (e.g. may impact timing and routability),
  2. use block rams instead, or
  3. fail

If case 1 or 2 it should be ok but if 3 then we might want to omit SRAMs > 4096 from this URAM list. But I think for now we can just assume 1 SRAM per URAM and handle this more complicated case later? Also, as Raghu said 4096 is the depth without packing into the 72-bit word width (#231), so it might actually be 8k or 16k words. This might be larger than anything we ever need so cascading may not be necessary.

dkoeplin commented 6 years ago

Ah interesting, thanks for pointing this out. In that case, if I see a bank larger than 4096 words I won't include it in the URAM candidate list for now. This doesn't happen extremely often in practice, so this simple solution should work ok for now.

mattfel1 commented 6 years ago

Do we have the metadata yet that tells me if I should uramify a memory?

dkoeplin commented 6 years ago

Not yet - will be adding it today