grayresearch / CX

Proposed RISC-V Composable Custom Extensions Specification
Apache License 2.0
66 stars 12 forks source link

Merge L0 and L1 #12

Open olofk opened 2 years ago

olofk commented 2 years ago

I think there's reason to merge the L0 and L1 levels for the sake of simplicity. Both can be seen as fixed latency with L0 having zero latency. There are currently some key differences in the spec but I'm not sure all of them are relevant.

L0 is considered stateless

The following simplified (and untested) module is a zero latency CFU with state

module cfu_sum
  (input wire   clk,
   input wire   req_valid,
   input [31:0] req_data0,
   output wire  resp_data,
   output wire  resp_valid);

   reg [31:0]   sum = 32'd0;

   assign resp_data  = req_data0 + sum;
   assign resp_valid = req_valid;

   always @(posedge clk)
     if (req_valid) sum <= resp_data;

endmodule

I think that reflects a potential real use case as well where something is precalculated in the CFU to minimize latency.

L0 has no clock

Personally, I think the clock should be outside of the spec but the reason is probably more taste than technical. If they remain in the spec, clk, and the closely coupled clk_en, could be made optional.

Conclusion

I think the above irons out all the essential differences between L0 and L1. Now, one could argue that L0 gives the requestor less time to respond, but with a fixed latency it's always the responsibility of the requestor to make sure it can handle a response it has requested at the time it comes back.

The think I'm less certain about is if this introduces routing problems with shared responders, but I can't see how that works in any fixed-latency setting either. So perhaps that's a non-issue too