Closed StefanSalewski closed 5 years ago
These are fundamentally different operations. a * a
is a fixed product, while the ^
proc can raise to any power. I guess there could be optimization for the case a ^2
, but then again, should a ^ 3
be optimized as well?
Yes, a ^ 2 and a ^ 3 would be nice. Note that we generally have expressions like
(last.x - curr.x) ^ 2 # last.x and curr.x are int field here. Do not want to type (last.x - curr.x) * (last.x - curr.x)
(OK, we may create and strongly advertise proc sqr() for this use case instead, but that will not prevent people from using ^)
I guess you could make ^
a macro that would generate the optimal code when the power is a literal or a constant.
Well, you can use pow
, as it will generate a call to pow
in C. Modern C/C++ compilers will optimize that one to a muliplication without any call into a function or loop, when the exponent is a constant integer.
proc main =
let a = 3.0
echo pow(a, 2)
main()
Currently I am a bit puzzled by the fact that we have both math.pow
and math.`^`
as they should do exactly the same(?).
Well, you can use pow
I was concerned about use of ^ for plain integer arguments! As in
import math, random
proc main =
var a, b: int
a = rand(7)
# b = (a + 1) ^ 2 # bigger executable
# b = sqr(a + 1) # sqr() not available
b = (a + 1) * (a + 1) # fine but much typing work
echo b
main()
Executable code size is 59416 vs 59328 with -d:release. So I strongly assume performance is concerned too.
When ^ works for ints, people generally use it. So it should give optimal code, or work not at all. (No one would try b = int((a.float + 1) ^ 2) and expect same code as for (a + 1) * (a + 1) )
Fixing this issue may be some fun, but maybe it is harder than it appears, and I am not sure if my skills are good enough for a good fix already. But I may try eventually.
I am tagging this as low priority, as nobody is working on it. It is not a bug nor do I see a significant performance problem. And since you can define your own local sqr
to square a number (with an even better name in my opinion) I don't see a problem for the pregramming language here.
Yes, low priority is fine, I would have tagged it that myself, but I have never managed tagging.
Performance impact is indeed not that big:
import random, math
proc main =
var s, j: int
for i in 0 .. 10000000:
j = rand(7)
s += j * j # j ^ 2
# s += j ^ 2
echo s
main()
with -d:release runtime is 0.190s vs 0.1590s
But my conclusion is still, it was a bad idea to enable ^ for ints without optimizing it well.
You can use a term-rewriting macro in a helper file to do the rewrite for you today. This is described for multiplication unrolling into addition in the manual: https://nim-lang.org/docs/manual.html#term-rewriting-macros
Otherwise I think the easiest way to implement this request is to have a static int overload.
proc `^`(x: int, y: static int): int =
when y = 2: x * x
when y = 3: x * x * x
else: powImpl(x, y)
mratsim, thank you very much for the suggestions. Sound both good, will try that eventually.
@StefanSalewski I would like to warn you though, term-rewriting macros have significant impact on compile times the compiler tries to match them everywhere.
I just came back to this issue. I tried hard to avoid ^ op for performance critical code, but then I have expressions like
if d < ((e.org.point.x - e.dest.point.x) * (e.org.point.x - e.dest.point.x) + (e.org.point.y - e.dest.point.y) * (e.org.point.y - e.dest.point.y)) * 1e-12:
Not that nice.
So I considered making a ^ proc variant for the plain case where exponent is a plain static int in range 2, 3, 4. That should cover 99% of all use cases.
But the funny fact is: Current ^ op is only slow, because it is not inlined! Latest test with latest Nim devel and gcc 9.1 gives perfect timing when inlined. I still wonder how gcc can optimize the Nim loop that well, as exponent is not a constant.
So what shall we do? Just add inline pragma? Or add variant with small static exponent?
Here my test code, perfect timing with the added inline pragma and compiled with -d:release of course:
# nim c -d:release k.nim
import random, math
proc `^`*[T](x: T, y: Natural): T {.inline.} =
## Computes ``x`` to the power ``y``.
##
## Exponent ``y`` must be non-negative, use
## `pow proc <#pow,float64,float64>`_ for negative exponents.
##
## See also:
## * `pow proc <#pow,float64,float64>`_ for negative exponent or
## floats
## * `sqrt proc <#sqrt,float64>`_
## * `cbrt proc <#cbrt,float64>`_
##
## .. code-block:: nim
## echo 2^3 # 8
## echo -2^3 # -8
when compiles(y >= T(0)):
assert y >= T(0)
else:
assert T(y) >= T(0)
var (x, y) = (x, y)
result = 1
while true:
if (y and 1) != 0:
result *= x
y = y shr 1
if y == 0:
break
x *= x
proc main =
var s, j: int
for i in 0 .. 10000000:
j = rand(7)
#s += j * j # j ^ 2
s += j ^ 2
echo s
main()
This is the C and Assemly code for ^ without inline pragma:
N_LIB_PRIVATE N_NIMCALL(NI, roof__07l2KPpVguQjVV5JTOdtEg)(NI x, NI y) {
NI result;
tyTuple_1v9bKyksXWMsm0vNwmZ4EuQ T1_;
NI x_2;
NI y_2;
result = (NI)0;
T1_.Field0 = x;
T1_.Field1 = y;
x_2 = T1_.Field0;
y_2 = T1_.Field1;
result = ((NI) 1);
{
while (1) {
{
if (!!(((NI)(((NI) (y_2)) & ((NI) 1)) == ((NI) 0)))) goto LA6_;
stareq__ogcC1Md4c289bEhAZWpmZUwsystem((&result), x_2);
}
LA6_: ;
y_2 = ((NI) ((NI)((NU64)(((NI) (y_2))) >> (NU64)(((NI) 1)))));
{
if (!(((NI) (y_2)) == ((NI) 0))) goto LA10_;
goto LA2;
}
LA10_: ;
stareq__ogcC1Md4c289bEhAZWpmZUwsystem((&x_2), x_2);
}
} LA2: ;
return result;
}
roof__07l2KPpVguQjVV5JTOdtEg:
.LFB16:
.cfi_startproc
movl $1, %eax
jmp .L15
.p2align 4,,10
.p2align 3
.L20:
imulq %rdi, %rdi
.L15:
testb $1, %sil
je .L13
imulq %rdi, %rax
.L13:
shrq %rsi
jne .L20
ret
.cfi_endproc
.LFE16:
.size roof__07l2KPpVguQjVV5JTOdtEg, .-roof__07l2KPpVguQjVV5JTOdtEg
.p2align 4
.globl main_r7VpEd9b9aQPUbk4pk8k4iZQ
.hidden main_r7VpEd9b9aQPUbk4pk8k4iZQ
.type main_r7VpEd9b9aQPUbk4pk8k4iZQ, @function
My current suggestion would be to add one of the following two procs to math module for static exponent. The second one has less source code and gcc unrolls the loop perfectly, so that may be the better solution?
proc `^`*[T](x: T, y: static[Natural]): T {.inline.} =
when y < 7:
when y == 0:
result = T(1)
when y == 1:
result = x
when y == 2:
result = x * x
when y == 3:
result = x * x * x
when y == 4:
result = x * x
result *= result
when y == 5:
result = x * x
result *= (result * x)
when y == 6:
result = x * x
result *= (result * result)
else:
result = math.`^`(x, y)
proc `^`*[T](x: T, y: static[Natural]): T {.inline.} =
when y < 10:
result = T(1)
var i = y
while i > 0:
result *= x
dec(i)
else:
result = math.`^`(x, y)
I just came back to this issue. I tried hard to avoid ^ op for performance critical code, but then I have expressions like
proc sqr*[T](x: T): T {.inline.} = x * x
Generally I prefer writing a ^ 2 instead of a * a when a is an int, and I indeed recommend that.
From code size compiled with -d:release and inspired by implementation of ^ operator I have the strong impression that a * a gives much better code than a ^ 2. Code size difference is more than 50 bytes. Really not an important issue, but maybe easy to fix?
Example