Open timotheecour opened 3 years ago
@xflywind looks like divmod
is a dead end, c compiler is smart enough to optimize mod + div.
benchmark:
import std/times
type c_div_t* {.importc:"div_t", header: "<stdlib.h>".} = object
quot*, rem*: cint
proc c_div*(num, den: cint): c_div_t {.importc:"div", header: "<stdlib.h>".}
proc divmod*(num: cint, den: static cint): c_div_t {.inline.} =
result.quot = num div den
result.rem = num mod den
proc divmod*(num: cint, den: cint): c_div_t {.inline.} = c_div(num, den)
proc divmod2*(num: cint, den: cint): c_div_t {.inline.} =
result.quot = num div den
result.rem = num - (result.quot * den)
template divmod3*(num: cint, den: cint): c_div_t =
var ret {.noInit.}: c_div_t
ret.quot = num div den
ret.rem = num - (ret.quot * den)
ret
import std/random
proc main()=
let n = 100_000_000
var list: seq[cint]
for i in 1..n:
# TODO: pseudo-random like 5j mod n-1?
list.add rand(10000).cint
list.add 1+(rand(10000)).cint
# list.add rand(cint.high).cint
# list.add 1+(rand(cint.high)).cint
let t = cpuTime()
var quot, rem: cint
for i in 0..<n:
let num = list[2*i]
let den = list[2*i+1]
when defined case1a:
let a = c_div(num, den)
quot += a.quot
rem += a.rem
when defined case1b:
let a1 = cast[int32](num) div cast[int32](den) # BUG: div doesn't support cint ?
let a2 = num mod den
quot += a1
rem += a2
when defined case1c:
let a = divmod(num, den)
quot += a.quot
rem += a.rem
when defined case1d:
let a = divmod2(num, den)
quot += a.quot
rem += a.rem
when defined case1e:
let a = divmod3(num, den)
quot += a.quot
rem += a.rem
let t2 = cpuTime() - t
echo (t2, quot, rem)
main()
nim r -d:case1a -d:danger $timn_D/tests/nim/all/t11386.nim (0.4422670000000002, 442602228, -1790799024)
nim r -d:case1b -d:danger $timn_D/tests/nim/all/t11386.nim (0.2159279999999999, 442602228, -1790799024)
nim r -d:case1d -d:danger $timn_D/tests/nim/all/t11386.nim (0.216126, 442602228, -1790799024)
@xflywind
I'm reopening because we currently dont' have mod+div for FP, and this divmod could be made to work across both integer and FP types
and also because:
let (quot,rem) = divmod(a,b)
is convenient (especially when a or b is some expression instead of symbol to avoid let a = expra, let b = exprb, let quot = a div b, let rem = a mod b)using this:
# in math.nim
proc divmod*[T: SomeNumber](num, den: T): tuple[quot: T, rem: T] {.inline.} =
when T is SomeFloat: use cmath.remquo
else: # note that c `div` is slower so we don't use that
result.quot = num div den
result.rem = num - (result.quot * den)
# in cmath.nim
# add remquo + friends
It looks reasonable to me. In my PR, I have to use extra temp variables which looks ugly to me.
@xflywind we also need modulo
for true modular arithmetics, that works with negative LHS.
Here's a reasonable implementation: https://stackoverflow.com/a/12089637/1426932 maybe there's a C function for that, I'm not sure
goal:
doAssert -1 mod 5 == -1
doAssert -1.modulo 5 == 4
(negative RHS should also be handled, if doesn't hurt performance for regular case)
so we need to think of a good API to take all these cases into account:
without introducing too many symbols nor sacrificing performance
I would port this, but I don't know where to put it? https://github.com/nothings/stb/blob/master/stb_divide.h License(MIT/Public domain)
// trunc: a/b truncates to 0, a%b has same sign as a
// floor: a/b truncates to -inf, a%b has same sign as b
// eucl: a/b truncates to sign(b)*inf, a%b is non-negative
one option:
type DivideMode = enum dZero, dFloor, dInsertMeaningfulName
proc divide[T](a, b: T, mode: static DivideMode)
=> zero overhead, and doesn't pollute scope too much, easier to remember once grouped
some details need to be figured out though, whether it makes sense for SomeFloat, behavior for b = 0, etc
EDIT (cf as mentioned in chat) to also handle divmod in presence of multiple type of divide we can do:
proc divmod[T](a, b: T, mode: static DivideMode = dZero): tuple[rem: T, quot: T]
so that divmod(a,b)
still semantically means (a div b, a mod b)
, and avoids bloating math with too many symbols
See related material https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/divmodnote-letter.pdf RFC https://github.com/nim-lang/RFCs/issues/135 Related library https://github.com/juancarlospaco/nim-euclidean/blob/master/src/euclidean.h
I'm working on this.
looks like divmod is a dead end, c compiler is smart enough to optimize mod + div.
any decent C compiler optimizes a division followed by a modulus into a single division/modulus instruction.
math
already has a version of floorDiv
and floorMod
. It is easy to implement a version of euclDiv
and euclMod
too.
Prototype
proc euclDiv(a, b: int): int =
result = a div b
let r = a mod b
if r >= 0:
discard
elif b > 0:
dec result
else:
inc result
proc euclMod(a, b: int): int =
result = a mod b
if result >= 0:
discard
elif b > 0:
inc result, b
else:
dec result, b
math already has a version of floorDiv and floorMod. It is easy to implement a version of euclDiv and euclMod too.
you're right, somehow I missed it. Shouldn't
func floorDiv*[T: SomeInteger](x, y: T): T =
be
func floorDiv*[T: SomeNumber](x, y: T): T =
?
(like func floorMod*[T: SomeNumber](x, y: T): T =
)
because `div' only support INT
because `div' only support INT
remquo, remquol, remquof
is the answerremainder
, and see whether we're using fmod
vs remainder
correctly in stdlibWhy we should support float? It is very scary because of float precision loss.
For Python
>>> 0.5 // 0.1
4.0
>>> 0.5 / 0.1
5.0
It just introduces noises.
because math.mod
already supports float:
nim --eval:'import math; echo 0.3 mod 0.1'
0.09999999999999998
and more importantly because it is useful (eg for linear algebra, graphics etc)
It is very scary because of float precision loss.
that's just FP semantics, nothing surprising
abs(0.1*(0.5 // 0.1) + (0.5 % 0.1) - 0.5) 0.0
If I use trunc(0.5 / 0.1)
as implementation, is it right?
The implementation for Python floor float div https://github.com/python/cpython/blob/52a327c1cbb86c7f2f5c460645889b23615261bf/Objects/floatobject.c#L636
I will take this for div(float) implementation which is better than C wrapper because it could be used everywhere.
If I use trunc(0.5 / 0.1) as implementation, is it right?
it's definitely different, trunc(0.5 / 0.1)
returns FP, whereas remquo returns (in C) (int,double). Also, remquo
may be more efficient than /
+ trunc
, but this needs to be checked
better than C wrapper because it could be used everywhere
performance matters; if there's a performance difference, we can always use vmops and/or VM/js fallback
copysign[...]
I agree, copysign is important and used a lot, let's try to get https://github.com/nim-lang/Nim/pull/16406 merged soon;
for copysign < 1.5.1, one possiblity is to implement it in std/private, then re-export it only when nimversion >= 1.5.1; or we can simply make an exception to the since
rule if that's simpler; but if copysign
is only used in math.nim, it's even simpler, all we need is definining it, but conditionally export it 1.5.1
mod
should document or (perhaps, idk) fix its behavior when b == 0 in a mod b
; right now:nim r --eval:'import math; echo 1 mod 0'
gives DivByZeroDefect
nim r --eval:'import math; echo 1.0 mod 0.0'
gives nan
python3:
1.0 % 0.0 gives ZeroDivisionError: float modulo
1 % 0 gives ZeroDivisionError: integer division or modulo by zero
modulo
instead of just divisionas we discussed in other places: remquo, remquol, remquof is the answer see also remainder, and see whether we're using fmod vs remainder correctly in stdlib
remquo and remainder round to nearest integer which are useless to us.
fmod is correct.
http://www.cplusplus.com/reference/cmath/remainder
Returns the floating-point remainder of numer/denom (rounded to nearest): remainder = numer - rquot * denom Where rquot is the result of: numer/denom, rounded toward the nearest integral value (with halfway cases rounded toward the even number). A similar function, fmod, returns the same but with the quotient truncated (rounded towards zero) instead. The function remquo has a behavior identical to this function, but it additionally provides access to the intermediate quotient value used.
So the conclusion is that I will use the codes below everywhere.
func `div`*[T: float32|float64](x, y: T): T =
## Computes the division operation for float values.
result = (x - x mod y) / y
mod
inside div(int), div(float), floorDiv, euclDiv (or at least a subset of those)? I know that C compiler can sometimes optimize this out, but there's no guarantee (and could depend on compilers etc), and there's also no guarantee that C compiler will do it for cases where user code ALSO calls mod outside those div calls. Once we have div, we can get mod via a multiplication + 1 subtractionvia a private func modLazy[T](x, y, quo: T): T {.inline.}
as drop-in replacement for mod
in those cases
@timotheecour Could you provide some examples for
and more importantly because it is useful (eg for linear algebra, graphics etc)
div + mod in 1 call
EDIT see also https://github.com/timotheecour/Nim/issues/396#issuecomment-743830463