Divide by non-power of two constant is not optimized

For example using the ceil_div function like: ceil_div(x, 4), we get this optimized verilog:

  assign p0_add_300_comb = p0_foo_comb + 7'h7f;
  assign p0_add_322_comb = {1'h0, p0_add_300_comb[6:2]} + 6'h01;
  assign p0_usual_comb = {1'h0, p0_add_322_comb};
  assign p0_and_307_comb = p0_usual_comb & {7{p0_foo_comb > 7'h00}};

but if we change the usage to ceil_div(x, 5) we get a divide:

  assign p0_add_301_comb = p0_foo_comb + 7'h7f;
  assign p0_udiv_303_comb = p0_add_301_comb / 7'h05;
  assign p0_usual_comb = p0_udiv_303_comb + 7'h01;
  assign p0_and_308_comb = p0_usual_comb & {7{p0_foo_comb > 7'h00}};

Compilers for software have used various tricks to strength-reduce division by a known constant. I imagine that HW can use the same mathematical identities. Here are some articles on the subject:

https://zneak.github.io/fcd/2017/02/19/divisions.html
https://www.drdobbs.com/parallel/optimizing-integer-division-by-a-constan/184408499
https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/

google / xls

Divide by non-power of two constant is not optimized #258