`std/int128s` and `type Duration = distinct int128`

timotheecour commented 3 years ago

proposal

add a std/int128s module containing 2 types: int128 and uint128

The API would be similar to std/jsbigints (https://nim-lang.github.io/Nim/jsbigints.html)

example use case 1: `type Duration = distinct int128`

one perfect use case is for representing Duration. Duration currently uses a sub-optimal design

type Duration* = object
  seconds: int64
  nanosecond: NanosecondRange

which forces every operation to be more costly than needed, and has underlying invalid bit representations. Using type Duration = distinct int128 would be much more natural, and efficient.

I consider this RFC as a pre-requisit for https://github.com/nim-lang/RFCs/issues/383 (APIs should have typesafe self-documenting time units, eg proc sleep(t: Duration), allowing sleep(1.millisecond) etc

Note: whether Duration can be re-defined as type Duration = distinct int128 is TBD, but even if backward compatibility prevents it, we can still define std/durations defining type TimeUnit = distinct int128 and use it throughout stdlib to represent durations in APIS (ie, for https://github.com/nim-lang/RFCs/issues/383); and std/times can use a conversion TimeUnit <=> Duration

example use case 2: compiler/int128 could reuse std/int128s

(and benefit from performance when native type is used)

why not use std/bigints instead?

std/bigints is also tracked and should eventually make its way in stdlib, but is no replacement for int128, for several reasons:

int128 will always be more performant than std/bigints to represent 128 ints
std/bigints involves much more dependencies, design tradeoffs compared to int128 which has a straightforward implementation
int128 has native support for most C compilers

implementation

most C compilers support this natively.

for C backend with a C compiler that supports __int128, wrap native operators (+, / etc) via importc procs; backend support can be done via #ifdef __SIZEOF_INT128__ (refs https://stackoverflow.com/questions/16088282/is-there-a-128-bit-integer-in-gcc/54815033#54815033); the check can be done at CT without having to use a custom flag (simpler than what we did with NIM_EmulateOverflowChecks)
for other C compilers, fallback to nim implementation (which we already have in compiler/int128.nim and which can be moved to lib/std/private/int128s)
for js backend, also fallback to that same nim implementation, or a wrapper around std/jsbigints or a wrapper around BigInt directly, whichever is simpler/faster, to maintain wraparound semantics, high, low etc
for VM, use vmops

Note, I've already implemented a lot of this, and it's just as easy to use as native types (eg int64) on the client side

importc for operators

std/jsbigints can write this:

func `+`*(x, y: JsBigInt): JsBigInt {.importjs: "(# $1 #)".}

but that doesn't work with importc (it requires importcpp, which requires cpp backend). What i did to circumvent this is to use an auxiliary .h file containing macros which i can then importc and it all works, but in future work this should be improved so that we can use the much simpler and direct way:

func `+`*(x, y: int128): int128 {.pattern: "# $1 #".}

refs https://github.com/nim-lang/RFCs/issues/315#issuecomment-873171898 which shows what should be done (that RFC needs to be updated to define pattern instead of improving importjs)

current unknown: what should be the type kind?

right now I'm using:
```
type int128* {.importc:"__int128_t", nodecl.} = object
```
this works (in particular has value semantics), but what's unclear is whether it should be "object" or something else on the RHS; it doesn't follow what the other integral types do (tyInt64 etc).
this works, but what's unclear is whether it should be "object" or something else on the RHS. for eg, in js backend this is what's used instead of object:
```
type JsBigIntImpl {.importjs: "bigint".} = int
type JsBigInt* = distinct JsBigIntImpl
```
another possibility would be to bake the type in the compiler as was done with tyInt64 etc, by introducing tyInt128 and tyUInt128 but the obvious downside is this would increase compiler complexity
yet another possibility is to introduce tyInteger (all ints) or tyNumber (all numbers) to also represent float in extended precision (float80), quadruple floats (float128, aka long double), or half-float (float16) using 1 type kind, or at least for all new such types (excluding tyInt32 etc)

bigint vs 128 performance

https://news.ycombinator.com/item?id=10909909

Optimization story: Switching from GMP to gcc's __int128 reduced run time by 95%
https://stackoverflow.com/questions/16088282/is-there-a-128-bit-integer-in-gcc#comment116127623_16088370

having an arbitrary-precision library just for a 128-bit type is just too wasteful, and the overhead is just too big. A fixed-width library like Boost.Multiprecision or calccrypto/uint128_t will be much smaller and faster – phuclv Jan 12 at 9:24

links

D20210318T153249
https://github.com/timotheecour/Nim/issues/762 which was a draft of this RFC; see some implementation discussion here: https://github.com/timotheecour/Nim/issues/762#issuecomment-867323663
https://github.com/nim-lang/RFCs/issues/383#issuecomment-864460282
https://github.com/status-im/nim-stint/blob/master/stint.nim has multiprecision library, including a Int128, so this is a candidate thing to compare against
```
func i128*(n: SomeInteger): Int128 {.inline.} = n.stint(128)
func i128*(s: string): Int128 {.inline.} = s.parse(Int128)
```
Split std/times by konsumlamm · Pull Request #18378 · nim-lang/Nim
https://crystal-lang.org/api/1.0.0/Int128.html
https://ziglang.org/documentation/master/#Primitive-Types
https://github.com/golang/go/issues/9455
crystal: https://crystal-lang.org/api/1.1.1/UInt128.html
julia: https://github.com/JuliaLang/julia/blob/1b93d53fc4bb59350ada898038ed4de2994cce33/base/docs/basedocs.jl#L1715-L1719

konsumlamm commented 3 years ago

The only nit I have: do we really have to suffix every module name with s? Sometimes, like in this case, that just doesn't make any sense, std/int128 would be way more natural imo.

timotheecour commented 3 years ago

module names where a symbol of same name exists causes issues though, and the s also reminds that there is not just int128 but also uint128.

module-symbol name clash example

(hence https://github.com/nim-lang/Nim/pull/15590)

# in std/int128:

type int128* = object
  data: array[2, uint64] # fallback impl

proc high*[T: int128](_: typedesc[T]): int128 = int128(data: [uint64.high, uint64.high])
  # value not correct but that's beside the point
  # (also besides the point is whether high should be defined)

# main.nim:
import std/int128
echo int128.high

would give: Error: type mismatch: got <proc (_: typedesc[T: int128]): int128> of course you could use import std/int128 as foo but that defeats the purpose.

in other cases it could be even more error prone, silently giving un-intended results.

using std/int128s just solve all such problems and is already a widely used convention.

Varriount commented 3 years ago

My only opinion on this is that I rather just have this type present in system.nim (or at least, imported implicitly).

Araq commented 3 years ago

My only opinion on this is that I rather just have this type present in system.nim (or at least, imported implicitly).

No...

rockcavera commented 3 years ago

I've always been in favor of having int128/uint128 as built-in types, but nothing against them being a std package.

I also wrote a 128-bit integer nimble package, nint128. I decided to write it because I believe it is possible to do more optimizations on a single type of integer than using arbitrary integers. In it I tried to write everything in pure Nim, with some optimizations using __int128 or intrinsics, when possible and really favoring the code.

I'm currently evolving the package to be fully pure Nim with the user being able to determine which operators should use __int128 (C extension for 128-bit integers, which is supported by GCC and CLANG) or intrinsic functions for 128-bit VCC.

Varriount commented 3 years ago

My only opinion on this is that I rather just have this type present in system.nim (or at least, imported implicitly).

No...

Why? I understand why things like, say, sequtils aren't in system.nim - it puts increased load on the compiler, and makes resolving name clashes harder - but I don't see why basic data types must be excluded.

Araq commented 3 years ago

"Basic data types" don't mean anything to me, we lived for 10 years without int128 but once added it's "basic"? Would a BigInt also be "basic" then? Instead we should make system.nim smaller, not bigger. It's pointless to optimize for the case "what can you write without an import statement" as already no real program out there can be written without an import.

c-blake commented 3 years ago

I am in favor of a "high resolution value dense numeric" time type. I am also in favor of some int128s in the stdlib (for hashes, EDIT: linkified random numbers, and various other circumstances it can clean things up where a nimble dep is a burden -- need not been in system to do so).

However, I think Time/Duration is a bad example of int128 because: 64-bits is plenty. Since there is a history of inadequate time address space allocation (y2k, 2038, etc.), I will elaborate a bit.

64-bits gets you 2**63./86400e9/365.25 = 292.27 years of nanoseconds. That is 1970 +- 292 = year 1678.. year 2262 or "from Newton to Star Trek" with a Unix epoch. Meanwhile, if you had a bigger time range than +-292 years, it is virtually certain you no longer care about nanosec resolution. After all, either it refers to a time before we could measure so accurately or to after a time we will likely be able to measure much more accurately. No one is comparing system or file times etc. across 3 centuries wanting {-, <, <=} in exact arithmetic to nanoseconds.

Meanwhile, only 53 bits of an IEEE double also get you 2**53./86400e6/365.25 = +-285 years of microseconds. It is also virtually certain that you very likely want float64 arithmetic in "big range" contexts, and already convert to float64 anyway. This also gracefully loses precision as range increases and gains it a little as range decreases - e.g. 2021-1970 = 51 years => only ~0.2 usec error. (This may matter if you convert to float64 before subtracting two high-res times, for example, but honestly it is still the case that not a whole lot happens in a few hundred nanoseconds.)

Actual change to time type representation should be discussed over in https://github.com/nim-lang/RFCs/issues/383, though. I comment here because my points pertain more to the example.

JohnAD commented 3 years ago

Having 128-bit uint support would drastically improve the performance of any 128-bit decimal library that conforms to the IEEE standard as it does a lot of 113-bit arithmetic.

(This does remind me that I need to re-arrange my life somehow so that I can resume work on the decimal library.)

But whether it is me or someone else, it would be a huge performance boost on machines that happen to actually have CPU/APU/GPU 128-bit math support; which many modern ones do.

juancarlospaco commented 3 years ago

Stripe about int128, uint128: https://twitter.com/zenazn/status/1433134722809430019

nim-lang / RFCs