sylefeb / Silice

Silice is an easy-to-learn, powerful hardware description language, that simplifies designing hardware algorithms with parallelism and pipelines.
Other
1.28k stars 77 forks source link

Empty FSM states could be optimized #247

Open suarezvictor opened 1 year ago

suarezvictor commented 1 year ago

When I translate some code to Verilog, there are some "do nothing" states that consumes a clock cycle where that can be avoided. For example, the only thing that state 8 does is set next state to 0, so the code that set the next state to 8 (while in state 6) can just skip state 8 it and set next state to 0. Below is the example code in Silice, and the resulting Verilog code.

Silice code:

algorithm uart_tx(
  input uint8   data,
  output uint1  tx_pin,
  output uint1  out_valid,
  input int32   clock_counter
)
{
  int32 t1 = uninitialized;
  t1 = clock_counter + 1;
  tx_pin = 0;
  out_valid = 1;
  while(clock_counter - t1 < 0)
    { out_valid = 0; }
  t1 = t1 + 1;
  uint8 mask = 1;
  while(mask != 0)
  {
    tx_pin = (data & mask) != 0;
    out_valid = 1;
    while(clock_counter - t1 < 0)
      { out_valid = 0; }
    t1 = t1 + 1;
    mask = mask << 1;
  }
  tx_pin = 1;
  out_valid = 1;
  while(clock_counter - t1 < 0)
    { out_valid = 0; }
}

Resulting verilog code

module M_uart_tx (
in_data,
in_clock_counter,
out_tx_pin,
out_out_valid,
in_run,
out_done,
reset,
out_clock,
clock
);
input  [7:0] in_data;
input signed [31:0] in_clock_counter;
output  [0:0] out_tx_pin;
output  [0:0] out_out_valid;
input in_run;
output out_done;
input reset;
output out_clock;
input clock;
assign out_clock = clock;

reg signed [31:0] _d_t1;
reg signed [31:0] _q_t1;
reg  [7:0] _d___block_3_mask;
reg  [7:0] _q___block_3_mask;
reg  [0:0] _d_tx_pin;
reg  [0:0] _q_tx_pin;
reg  [0:0] _d_out_valid;
reg  [0:0] _q_out_valid;
reg  [3:0] _d__idx_fsm0,_q__idx_fsm0;
assign out_tx_pin = _q_tx_pin;
assign out_out_valid = _q_out_valid;
assign out_done = (_q__idx_fsm0 == 0)
;

`ifdef FORMAL
initial begin
assume(reset);
end
assume property($initstate || (in_run || out_done));
`endif
always @* begin
_d_t1 = _q_t1;
_d___block_3_mask = _q___block_3_mask;
_d_tx_pin = _q_tx_pin;
_d_out_valid = _q_out_valid;
_d__idx_fsm0 = _q__idx_fsm0;
// _always_pre
(* full_case *)
case (_q__idx_fsm0)
1: begin
// _top
_d_t1 = in_clock_counter+1;

_d_tx_pin = 0;

_d_out_valid = 1;

_d__idx_fsm0 = 2;
end
2: begin
// __while__block_1
if (in_clock_counter-_q_t1<0) begin
// __block_2
// __block_4
_d_out_valid = 0;

// __block_5
_d__idx_fsm0 = 2;
end else begin
_d__idx_fsm0 = 3;
end
end
3: begin
// __block_3
// var inits
_d___block_3_mask = 1;
// --
_d_t1 = _q_t1+1;

_d__idx_fsm0 = 4;
end
4: begin
// __while__block_6
if (_q___block_3_mask!=0) begin
// __block_7
// __block_9
_d_tx_pin = (in_data&_q___block_3_mask)!=0;

_d_out_valid = 1;

_d__idx_fsm0 = 7;
end else begin
_d__idx_fsm0 = 5;
end
end
7: begin
// __while__block_10
if (in_clock_counter-_q_t1<0) begin
// __block_11
// __block_13
_d_out_valid = 0;

// __block_14
_d__idx_fsm0 = 7;
end else begin
_d__idx_fsm0 = 9;
end
end
5: begin
// __block_8
_d_tx_pin = 1;

_d_out_valid = 1;

_d__idx_fsm0 = 6;
end
9: begin
// __block_12
_d_t1 = _q_t1+1;

_d___block_3_mask = _q___block_3_mask<<1;

// __block_15
_d__idx_fsm0 = 4;
end
6: begin
// __while__block_16
if (in_clock_counter-_q_t1<0) begin
// __block_17
// __block_19
_d_out_valid = 0;

// __block_20
_d__idx_fsm0 = 6;
end else begin
_d__idx_fsm0 = 8;
end
end
8: begin
// __block_18
_d__idx_fsm0 = 0;
end
0: begin 
end
default: begin 
_d__idx_fsm0 = {4{1'bx}};
`ifdef FORMAL
assume(0);
`endif
 end
endcase
// _always_post
// pipeline stage triggers
end

always @(posedge clock) begin
_q_t1 <= _d_t1;
_q___block_3_mask <= (reset | ~in_run) ? 1 : _d___block_3_mask;
_q_tx_pin <= _d_tx_pin;
_q_out_valid <= _d_out_valid;
_q__idx_fsm0 <= reset ? 0 : ( ~in_run ? 1 : _d__idx_fsm0);
end

endmodule
sylefeb commented 1 year ago

Good catch, thank you for the report. This is due to the termination state being a special case. It was not properly taken into account in state fastForward (which purpose is to skip such empty states). I pushed a fix in draft, please confirm whether this is ok.

suarezvictor commented 1 year ago

Good. I think I have somewhere another case doing nested loops, if I'm able to spot it I'll report. Do you say that the draft branch is ready to be tested? Were you able to run my example?

sylefeb commented 1 year ago

Yes, this example and others, all looks good here. Here is how state 6 now jumps directly to 0.

6: begin
// __while__block_16
if (in_clock_counter-_q_t1<0) begin
// __block_17
// __block_19
_d_out_valid = 0;

// __block_20
_d__idx_fsm0 = 6;
end else begin
_d__idx_fsm0 = 0;
end
end

I am opening a different issue (#248) regarding state 8 being unused but left there, this should be fixed too.

suarezvictor commented 1 year ago

Good! I'm impressed about how fast you made the fix. I'm eager to try it with other cores I have

rob-ng15 commented 1 year ago

Hi Sylvian/Victor,

I've updated to the latest devel branch, and can confirm that PAWSv2 RV64 builds and runs with no issues! Thanks as always for your hard work.

Rob.

On Sat, 11 Mar 2023 at 21:18, Victor Suarez Rovere @.***> wrote:

Good! I'm impressed about how fast you made the fix. I'm eager to try it with other cores I have

— Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/247#issuecomment-1465021547, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT3CZ7QPM6LO7WIGUNLW3TTY7ANCNFSM6AAAAAAVXRQUOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

suarezvictor commented 1 year ago

My example works too. Kudos to Sylvian! There's still room for optimization as I posted in the related issue https://github.com/sylefeb/Silice/issues/248. I'm not familiar with the code to send patches but this project is awesome and hopefully I can contribute more

sylefeb commented 1 year ago

Many thanks for the feedback, tests, and encouragements!! I'll try to address pending issues soon!