misc mod: `divmod`, `mod` for SomeFloat, `modulo` (with `-1.modulo 5 == 4`)

timotheecour commented 3 years ago

div + mod in 1 call

yeah, python has divmod. Will consider implement it in Nim. https://docs.python.org/3/library/functions.html#divmod
So does int128 in Nim/

flywind @xflywind 04:52
https://code.woboq.org/gcc/libgcc/divmod.c.html

flywind @xflywind 04:58
It seems that compiler will make lots of optimization for us(especially small number). It's not necessary to use divmod. https://www.codeproject.com/Tips/1274380/Cplusplus11-std-div-Benchmark
https://stackoverflow.com/questions/30079879/is-divmod-faster-than-using-the-and-operators#:~:text=when%20dividing%20a%2022%2Dmillion,separately%2C%20as%20you%20might%20expect.

flywind @xflywind 05:41
I will try this paper for printing float to string(notes crystal uses this implementation which is a language beats Nim in many benchmarks) https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
I'm not sure whether it is better than sprintf.

flywind @xflywind 06:30
the divmod implementation of ruby https://github.com/ruby/ruby/blob/9e41a75255d15765648279629fd3134cae076398/internal/fixnum.h#L122

timotheecour commented 3 years ago

@xflywind looks like divmod is a dead end, c compiler is smart enough to optimize mod + div. benchmark:


import std/times

type c_div_t* {.importc:"div_t", header: "<stdlib.h>".} = object
  quot*, rem*: cint

proc c_div*(num, den: cint): c_div_t {.importc:"div", header: "<stdlib.h>".}

proc divmod*(num: cint, den: static cint): c_div_t {.inline.} =
  result.quot = num div den
  result.rem = num mod den

proc divmod*(num: cint, den: cint): c_div_t {.inline.} = c_div(num, den)

proc divmod2*(num: cint, den: cint): c_div_t {.inline.} =
  result.quot = num div den
  result.rem = num - (result.quot * den)

template divmod3*(num: cint, den: cint): c_div_t =
  var ret {.noInit.}: c_div_t
  ret.quot = num div den
  ret.rem = num - (ret.quot * den)
  ret

import std/random

proc main()=
  let n = 100_000_000
  var list: seq[cint]
  for i in 1..n:
    # TODO: pseudo-random like 5j mod n-1?
    list.add rand(10000).cint
    list.add 1+(rand(10000)).cint
    # list.add rand(cint.high).cint
    # list.add 1+(rand(cint.high)).cint

  let t = cpuTime()
  var quot, rem: cint
  for i in 0..<n:
    let num = list[2*i]
    let den = list[2*i+1]

    when defined case1a:
      let a = c_div(num, den)
      quot += a.quot
      rem += a.rem
    when defined case1b:
      let a1 = cast[int32](num) div cast[int32](den) # BUG: div doesn't support cint ?
      let a2 = num mod den
      quot += a1
      rem += a2
    when defined case1c:
      let a = divmod(num, den)
      quot += a.quot
      rem += a.rem
    when defined case1d:
      let a = divmod2(num, den)
      quot += a.quot
      rem += a.rem
    when defined case1e:
      let a = divmod3(num, den)
      quot += a.quot
      rem += a.rem
  let t2 = cpuTime() - t
  echo (t2, quot, rem)
main()

nim r -d:case1a -d:danger $timn_D/tests/nim/all/t11386.nim (0.4422670000000002, 442602228, -1790799024)

nim r -d:case1b -d:danger $timn_D/tests/nim/all/t11386.nim (0.2159279999999999, 442602228, -1790799024)

nim r -d:case1d -d:danger $timn_D/tests/nim/all/t11386.nim (0.216126, 442602228, -1790799024)

timotheecour commented 3 years ago

@xflywind

I'm reopening because we currently dont' have mod+div for FP, and this divmod could be made to work across both integer and FP types

and also because:

let (quot,rem) = divmod(a,b) is convenient (especially when a or b is some expression instead of symbol to avoid let a = expra, let b = exprb, let quot = a div b, let rem = a mod b)

using this:

# in math.nim
proc divmod*[T: SomeNumber](num, den: T): tuple[quot: T, rem: T] {.inline.} =
  when T is SomeFloat: use cmath.remquo
  else: # note that c `div` is slower so we don't use that
    result.quot = num div den
    result.rem = num - (result.quot * den)

# in cmath.nim
# add remquo + friends

ringabout commented 3 years ago

It looks reasonable to me. In my PR, I have to use extra temp variables which looks ugly to me.

timotheecour commented 3 years ago

@xflywind we also need modulo for true modular arithmetics, that works with negative LHS.

Here's a reasonable implementation: https://stackoverflow.com/a/12089637/1426932 maybe there's a C function for that, I'm not sure

goal:

doAssert -1 mod 5 == -1
doAssert -1.modulo 5 == 4

(negative RHS should also be handled, if doesn't hurt performance for regular case)

so we need to think of a good API to take all these cases into account:

wrap around vs symmetry around 0
mod vs divmod
all this needs to work with SomeInt and SomeFloat

without introducing too many symbols nor sacrificing performance

links

https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/

ringabout commented 3 years ago

I would port this, but I don't know where to put it? https://github.com/nothings/stb/blob/master/stb_divide.h License(MIT/Public domain)

//     trunc:   a/b truncates to 0,           a%b has same sign as a
//     floor:   a/b truncates to -inf,        a%b has same sign as b
//     eucl:    a/b truncates to sign(b)*inf, a%b is non-negative

timotheecour commented 3 years ago

one option:

type DivideMode = enum dZero, dFloor, dInsertMeaningfulName
proc divide[T](a, b: T, mode: static DivideMode)

=> zero overhead, and doesn't pollute scope too much, easier to remember once grouped

some details need to be figured out though, whether it makes sense for SomeFloat, behavior for b = 0, etc

timotheecour commented 3 years ago

EDIT (cf as mentioned in chat) to also handle divmod in presence of multiple type of divide we can do:

proc divmod[T](a, b: T, mode: static DivideMode = dZero): tuple[rem: T, quot: T]

so that divmod(a,b) still semantically means (a div b, a mod b), and avoids bloating math with too many symbols

ringabout commented 3 years ago

I'm working on this.

ringabout commented 3 years ago

looks like divmod is a dead end, c compiler is smart enough to optimize mod + div.

any decent C compiler optimizes a division followed by a modulus into a single division/modulus instruction.

ringabout commented 3 years ago

math already has a version of floorDiv and floorMod. It is easy to implement a version of euclDiv and euclMod too.

Prototype

proc euclDiv(a, b: int): int =
  result = a div b
  let r = a mod b
  if r >= 0:
    discard
  elif b > 0:
    dec result
  else:
    inc result

proc euclMod(a, b: int): int =
  result = a mod b
  if result >= 0:
    discard
  elif b > 0:
    inc result, b
  else:
    dec result, b

timotheecour commented 3 years ago

math already has a version of floorDiv and floorMod. It is easy to implement a version of euclDiv and euclMod too.

you're right, somehow I missed it. Shouldn't func floorDiv*[T: SomeInteger](x, y: T): T = be func floorDiv*[T: SomeNumber](x, y: T): T = ?

(like func floorMod*[T: SomeNumber](x, y: T): T =)

ringabout commented 3 years ago

because `div' only support INT

timotheecour commented 3 years ago

because `div' only support INT

as we discussed in other places: remquo, remquol, remquof is the answer
see also remainder, and see whether we're using fmod vs remainder correctly in stdlib

ringabout commented 3 years ago

Why we should support float? It is very scary because of float precision loss.

For Python

>>> 0.5 // 0.1
4.0

>>> 0.5 / 0.1
5.0

It just introduces noises.

timotheecour commented 3 years ago

because math.mod already supports float: nim --eval:'import math; echo 0.3 mod 0.1' 0.09999999999999998

and more importantly because it is useful (eg for linear algebra, graphics etc)

It is very scary because of float precision loss.

that's just FP semantics, nothing surprising

abs(0.1*(0.5 // 0.1) + (0.5 % 0.1) - 0.5) 0.0

ringabout commented 3 years ago

If I use trunc(0.5 / 0.1) as implementation, is it right?

ringabout commented 3 years ago

The implementation for Python floor float div https://github.com/python/cpython/blob/52a327c1cbb86c7f2f5c460645889b23615261bf/Objects/floatobject.c#L636

I will take this for div(float) implementation which is better than C wrapper because it could be used everywhere.

timotheecour commented 3 years ago

If I use trunc(0.5 / 0.1) as implementation, is it right?

it's definitely different, trunc(0.5 / 0.1) returns FP, whereas remquo returns (in C) (int,double). Also, remquo may be more efficient than / + trunc, but this needs to be checked

better than C wrapper because it could be used everywhere

performance matters; if there's a performance difference, we can always use vmops and/or VM/js fallback

copysign[...]

I agree, copysign is important and used a lot, let's try to get https://github.com/nim-lang/Nim/pull/16406 merged soon;

for copysign < 1.5.1, one possiblity is to implement it in std/private, then re-export it only when nimversion >= 1.5.1; or we can simply make an exception to the since rule if that's simpler; but if copysign is only used in math.nim, it's even simpler, all we need is definining it, but conditionally export it 1.5.1

timotheecour commented 3 years ago

[ ] mod should document or (perhaps, idk) fix its behavior when b == 0 in a mod b; right now:

nim r --eval:'import math; echo 1 mod 0' gives DivByZeroDefect nim r --eval:'import math; echo 1.0 mod 0.0' gives nan

python3:

1.0 % 0.0 gives ZeroDivisionError: float modulo
1 % 0 gives ZeroDivisionError: integer division or modulo by zero

[ ] DivByZeroDefect should perhaps give a msg indicating modulo instead of just division

ringabout commented 3 years ago

as we discussed in other places: remquo, remquol, remquof is the answer see also remainder, and see whether we're using fmod vs remainder correctly in stdlib

remquo and remainder round to nearest integer which are useless to us.

fmod is correct.

http://www.cplusplus.com/reference/cmath/remainder

Returns the floating-point remainder of numer/denom (rounded to nearest): remainder = numer - rquot * denom Where rquot is the result of: numer/denom, rounded toward the nearest integral value (with halfway cases rounded toward the even number). A similar function, fmod, returns the same but with the quotient truncated (rounded towards zero) instead. The function remquo has a behavior identical to this function, but it additionally provides access to the intermediate quotient value used.

ringabout commented 3 years ago

So the conclusion is that I will use the codes below everywhere.

func `div`*[T: float32|float64](x, y: T): T =
  ## Computes the division operation for float values.
  result = (x - x mod y) / y

timotheecour commented 3 years ago

[ ] can we avoid a call to mod inside div(int), div(float), floorDiv, euclDiv (or at least a subset of those)? I know that C compiler can sometimes optimize this out, but there's no guarantee (and could depend on compilers etc), and there's also no guarantee that C compiler will do it for cases where user code ALSO calls mod outside those div calls. Once we have div, we can get mod via a multiplication + 1 subtraction

via a private func modLazy[T](x, y, quo: T): T {.inline.} as drop-in replacement for mod in those cases

ringabout commented 3 years ago

@timotheecour Could you provide some examples for

and more importantly because it is useful (eg for linear algebra, graphics etc)

see https://bugs.python.org/issue16460

timotheecour / Nim

misc mod: `divmod`, `mod` for SomeFloat, `modulo` (with `-1.modulo 5 == 4`) #396

links