Power op ^ is not optimized for a: int; echo a ^ 2 case (minor bug)

nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).

https://nim-lang.org

Other

16.59k stars 1.47k forks source link

Power op ^ is not optimized for a: int; echo a ^ 2 case (minor bug) #10910

Closed StefanSalewski closed 5 years ago

StefanSalewski commented 5 years ago

Generally I prefer writing a ^ 2 instead of a * a when a is an int, and I indeed recommend that.

From code size compiled with -d:release and inspired by implementation of ^ operator I have the strong impression that a * a gives much better code than a ^ 2. Code size difference is more than 50 bytes. Really not an important issue, but maybe easy to fix?

Example

import math

proc main =
  let a = 3
  echo a ^ 2

main()

zevv commented 5 years ago

These are fundamentally different operations. a * a is a fixed product, while the ^ proc can raise to any power. I guess there could be optimization for the case a ^2, but then again, should a ^ 3 be optimized as well?

StefanSalewski commented 5 years ago

Yes, a ^ 2 and a ^ 3 would be nice. Note that we generally have expressions like

(last.x - curr.x) ^ 2 # last.x and curr.x are int field here. Do not want to type (last.x - curr.x) * (last.x - curr.x)

(OK, we may create and strongly advertise proc sqr() for this use case instead, but that will not prevent people from using ^)

dawkot commented 5 years ago

I guess you could make ^ a macro that would generate the optimal code when the power is a literal or a constant.

krux02 commented 5 years ago

Well, you can use pow, as it will generate a call to pow in C. Modern C/C++ compilers will optimize that one to a muliplication without any call into a function or loop, when the exponent is a constant integer.

proc main =
  let a = 3.0
  echo pow(a, 2)

main()

Currently I am a bit puzzled by the fact that we have both math.pow and math.`^` as they should do exactly the same(?).

StefanSalewski commented 5 years ago

Well, you can use pow

I was concerned about use of ^ for plain integer arguments! As in

import math, random
proc main =

  var a, b: int
  a = rand(7)
  # b = (a + 1) ^ 2 # bigger executable
  # b = sqr(a + 1) # sqr() not available 
  b = (a + 1) * (a + 1) # fine but much typing work
  echo b

main()

Executable code size is 59416 vs 59328 with -d:release. So I strongly assume performance is concerned too.

When ^ works for ints, people generally use it. So it should give optimal code, or work not at all. (No one would try b = int((a.float + 1) ^ 2) and expect same code as for (a + 1) * (a + 1) )

Fixing this issue may be some fun, but maybe it is harder than it appears, and I am not sure if my skills are good enough for a good fix already. But I may try eventually.

krux02 commented 5 years ago

I am tagging this as low priority, as nobody is working on it. It is not a bug nor do I see a significant performance problem. And since you can define your own local sqr to square a number (with an even better name in my opinion) I don't see a problem for the pregramming language here.

StefanSalewski commented 5 years ago

Yes, low priority is fine, I would have tagged it that myself, but I have never managed tagging.

Performance impact is indeed not that big:

import random, math

proc main =
  var s, j: int
  for i in 0 .. 10000000:
    j = rand(7)
    s += j * j # j ^ 2
    # s += j ^ 2
  echo s

main()

with -d:release runtime is 0.190s vs 0.1590s

But my conclusion is still, it was a bad idea to enable ^ for ints without optimizing it well.

mratsim commented 5 years ago

You can use a term-rewriting macro in a helper file to do the rewrite for you today. This is described for multiplication unrolling into addition in the manual: https://nim-lang.org/docs/manual.html#term-rewriting-macros

Otherwise I think the easiest way to implement this request is to have a static int overload.

proc `^`(x: int, y: static int): int =
  when y = 2: x * x
  when y = 3: x * x * x
  else: powImpl(x, y)

StefanSalewski commented 5 years ago

mratsim, thank you very much for the suggestions. Sound both good, will try that eventually.

krux02 commented 5 years ago

@StefanSalewski I would like to warn you though, term-rewriting macros have significant impact on compile times the compiler tries to match them everywhere.

StefanSalewski commented 5 years ago

I just came back to this issue. I tried hard to avoid ^ op for performance critical code, but then I have expressions like

if d < ((e.org.point.x - e.dest.point.x) * (e.org.point.x - e.dest.point.x) + (e.org.point.y - e.dest.point.y) * (e.org.point.y - e.dest.point.y)) * 1e-12:

Not that nice.

So I considered making a ^ proc variant for the plain case where exponent is a plain static int in range 2, 3, 4. That should cover 99% of all use cases.

But the funny fact is: Current ^ op is only slow, because it is not inlined! Latest test with latest Nim devel and gcc 9.1 gives perfect timing when inlined. I still wonder how gcc can optimize the Nim loop that well, as exponent is not a constant.

So what shall we do? Just add inline pragma? Or add variant with small static exponent?

Here my test code, perfect timing with the added inline pragma and compiled with -d:release of course:

# nim c -d:release k.nim
import random, math

proc `^`*[T](x: T, y: Natural): T {.inline.} =
  ## Computes ``x`` to the power ``y``.
  ##
  ## Exponent ``y`` must be non-negative, use
  ## `pow proc <#pow,float64,float64>`_ for negative exponents.
  ##
  ## See also:
  ## * `pow proc <#pow,float64,float64>`_ for negative exponent or
  ##   floats
  ## * `sqrt proc <#sqrt,float64>`_
  ## * `cbrt proc <#cbrt,float64>`_
  ##
  ## .. code-block:: nim
  ##  echo 2^3  # 8
  ##  echo -2^3 # -8
  when compiles(y >= T(0)):
    assert y >= T(0)
  else:
    assert T(y) >= T(0)
  var (x, y) = (x, y)
  result = 1

  while true:
    if (y and 1) != 0:
      result *= x
    y = y shr 1
    if y == 0:
      break
    x *= x

proc main =
  var s, j: int
  for i in 0 .. 10000000:
    j = rand(7)
    #s += j * j # j ^ 2
    s += j ^ 2
  echo s

main()

StefanSalewski commented 5 years ago

This is the C and Assemly code for ^ without inline pragma:

N_LIB_PRIVATE N_NIMCALL(NI, roof__07l2KPpVguQjVV5JTOdtEg)(NI x, NI y) {
    NI result;
    tyTuple_1v9bKyksXWMsm0vNwmZ4EuQ T1_;
    NI x_2;
    NI y_2;
    result = (NI)0;
    T1_.Field0 = x;
    T1_.Field1 = y;
    x_2 = T1_.Field0;
    y_2 = T1_.Field1;
    result = ((NI) 1);
    {
        while (1) {
            {
                if (!!(((NI)(((NI) (y_2)) & ((NI) 1)) == ((NI) 0)))) goto LA6_;
                stareq__ogcC1Md4c289bEhAZWpmZUwsystem((&result), x_2);
            }
            LA6_: ;
            y_2 = ((NI) ((NI)((NU64)(((NI) (y_2))) >> (NU64)(((NI) 1)))));
            {
                if (!(((NI) (y_2)) == ((NI) 0))) goto LA10_;
                goto LA2;
            }
            LA10_: ;
            stareq__ogcC1Md4c289bEhAZWpmZUwsystem((&x_2), x_2);
        }
    } LA2: ;
    return result;
}

roof__07l2KPpVguQjVV5JTOdtEg:
.LFB16:
    .cfi_startproc
    movl    $1, %eax
    jmp .L15
    .p2align 4,,10
    .p2align 3
.L20:
    imulq   %rdi, %rdi
.L15:
    testb   $1, %sil
    je  .L13
    imulq   %rdi, %rax
.L13:
    shrq    %rsi
    jne .L20
    ret
    .cfi_endproc
.LFE16:
    .size   roof__07l2KPpVguQjVV5JTOdtEg, .-roof__07l2KPpVguQjVV5JTOdtEg
    .p2align 4
    .globl  main_r7VpEd9b9aQPUbk4pk8k4iZQ
    .hidden main_r7VpEd9b9aQPUbk4pk8k4iZQ
    .type   main_r7VpEd9b9aQPUbk4pk8k4iZQ, @function

StefanSalewski commented 5 years ago

My current suggestion would be to add one of the following two procs to math module for static exponent. The second one has less source code and gcc unrolls the loop perfectly, so that may be the better solution?

proc `^`*[T](x: T, y: static[Natural]): T {.inline.} =
  when y < 7:
    when y == 0:
      result = T(1)
    when y == 1:
      result = x
    when y == 2:
      result = x * x
    when y == 3:
      result = x * x * x
    when y == 4:
      result = x * x
      result *= result
    when y == 5:
      result = x * x
      result *= (result * x)
    when y == 6:
      result = x * x
      result *= (result * result)
  else:
    result = math.`^`(x, y)

proc `^`*[T](x: T, y: static[Natural]): T {.inline.} =
  when y < 10:
    result = T(1)
    var i = y
    while i > 0:
      result *= x
      dec(i)
  else:
    result = math.`^`(x, y)

Araq commented 5 years ago

I just came back to this issue. I tried hard to avoid ^ op for performance critical code, but then I have expressions like


proc sqr*[T](x: T): T {.inline.} = x * x