google / xls

XLS: Accelerated HW Synthesis
http://google.github.io/xls/
Apache License 2.0
1.21k stars 179 forks source link

Divide by non-power of two constant is not optimized #258

Open dkillebrew-g opened 3 years ago

dkillebrew-g commented 3 years ago

For example using the ceil_div function like: ceil_div(x, 4), we get this optimized verilog:

  assign p0_add_300_comb = p0_foo_comb + 7'h7f;
  assign p0_add_322_comb = {1'h0, p0_add_300_comb[6:2]} + 6'h01;
  assign p0_usual_comb = {1'h0, p0_add_322_comb};
  assign p0_and_307_comb = p0_usual_comb & {7{p0_foo_comb > 7'h00}};

but if we change the usage to ceil_div(x, 5) we get a divide:

  assign p0_add_301_comb = p0_foo_comb + 7'h7f;
  assign p0_udiv_303_comb = p0_add_301_comb / 7'h05;
  assign p0_usual_comb = p0_udiv_303_comb + 7'h01;
  assign p0_and_308_comb = p0_usual_comb & {7{p0_foo_comb > 7'h00}};

Compilers for software have used various tricks to strength-reduce division by a known constant. I imagine that HW can use the same mathematical identities. Here are some articles on the subject:

dkillebrew-g commented 3 years ago

From what I gather, one algorithm is, in a nutshell, to multiply by the reciprocal. That is, instead of x / d, do x * (1/d). Then replace 1/d by a fixed point value that approximates (to a sufficient precision based on the bitwidth of x and of the result) 1/d, let's describe this rational number as p/q. Because shifting is cheap, we require a power of two denominator (i.e., q is a power of two). Thus, after multiplying x * p, a right shift (by log2(q)) is performed.

There are a few more details, see pp 137 of https://www.agner.org/optimize/optimizing_assembly.pdf