nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

`std/int128s` and `type Duration = distinct int128` #399

Open timotheecour opened 2 years ago

timotheecour commented 2 years ago

proposal

add a std/int128s module containing 2 types: int128 and uint128

The API would be similar to std/jsbigints (https://nim-lang.github.io/Nim/jsbigints.html)

example use case 1: type Duration = distinct int128

one perfect use case is for representing Duration. Duration currently uses a sub-optimal design

type Duration* = object
  seconds: int64
  nanosecond: NanosecondRange

which forces every operation to be more costly than needed, and has underlying invalid bit representations. Using type Duration = distinct int128 would be much more natural, and efficient.

I consider this RFC as a pre-requisit for https://github.com/nim-lang/RFCs/issues/383 (APIs should have typesafe self-documenting time units, eg proc sleep(t: Duration), allowing sleep(1.millisecond) etc

Note: whether Duration can be re-defined as type Duration = distinct int128 is TBD, but even if backward compatibility prevents it, we can still define std/durations defining type TimeUnit = distinct int128 and use it throughout stdlib to represent durations in APIS (ie, for https://github.com/nim-lang/RFCs/issues/383); and std/times can use a conversion TimeUnit <=> Duration

example use case 2: compiler/int128 could reuse std/int128s

(and benefit from performance when native type is used)

why not use std/bigints instead?

std/bigints is also tracked and should eventually make its way in stdlib, but is no replacement for int128, for several reasons:

implementation

most C compilers support this natively.

Note, I've already implemented a lot of this, and it's just as easy to use as native types (eg int64) on the client side

importc for operators

std/jsbigints can write this:

func `+`*(x, y: JsBigInt): JsBigInt {.importjs: "(# $1 #)".} 

but that doesn't work with importc (it requires importcpp, which requires cpp backend). What i did to circumvent this is to use an auxiliary .h file containing macros which i can then importc and it all works, but in future work this should be improved so that we can use the much simpler and direct way:

func `+`*(x, y: int128): int128 {.pattern: "# $1 #".} 

refs https://github.com/nim-lang/RFCs/issues/315#issuecomment-873171898 which shows what should be done (that RFC needs to be updated to define pattern instead of improving importjs)

current unknown: what should be the type kind?

bigint vs 128 performance

links

konsumlamm commented 2 years ago

The only nit I have: do we really have to suffix every module name with s? Sometimes, like in this case, that just doesn't make any sense, std/int128 would be way more natural imo.

timotheecour commented 2 years ago

module names where a symbol of same name exists causes issues though, and the s also reminds that there is not just int128 but also uint128.

module-symbol name clash example

(hence https://github.com/nim-lang/Nim/pull/15590)

# in std/int128:

type int128* = object
  data: array[2, uint64] # fallback impl

proc high*[T: int128](_: typedesc[T]): int128 = int128(data: [uint64.high, uint64.high])
  # value not correct but that's beside the point
  # (also besides the point is whether high should be defined)
# main.nim:
import std/int128
echo int128.high

would give: Error: type mismatch: got <proc (_: typedesc[T: int128]): int128> of course you could use import std/int128 as foo but that defeats the purpose.

in other cases it could be even more error prone, silently giving un-intended results.

using std/int128s just solve all such problems and is already a widely used convention.

Varriount commented 2 years ago

My only opinion on this is that I rather just have this type present in system.nim (or at least, imported implicitly).

Araq commented 2 years ago

My only opinion on this is that I rather just have this type present in system.nim (or at least, imported implicitly).

No...

rockcavera commented 2 years ago

I've always been in favor of having int128/uint128 as built-in types, but nothing against them being a std package.

I also wrote a 128-bit integer nimble package, nint128. I decided to write it because I believe it is possible to do more optimizations on a single type of integer than using arbitrary integers. In it I tried to write everything in pure Nim, with some optimizations using __int128 or intrinsics, when possible and really favoring the code.

I'm currently evolving the package to be fully pure Nim with the user being able to determine which operators should use __int128 (C extension for 128-bit integers, which is supported by GCC and CLANG) or intrinsic functions for 128-bit VCC.

Varriount commented 2 years ago

My only opinion on this is that I rather just have this type present in system.nim (or at least, imported implicitly).

No...

Why? I understand why things like, say, sequtils aren't in system.nim - it puts increased load on the compiler, and makes resolving name clashes harder - but I don't see why basic data types must be excluded.

Araq commented 2 years ago

"Basic data types" don't mean anything to me, we lived for 10 years without int128 but once added it's "basic"? Would a BigInt also be "basic" then? Instead we should make system.nim smaller, not bigger. It's pointless to optimize for the case "what can you write without an import statement" as already no real program out there can be written without an import.

c-blake commented 2 years ago

I am in favor of a "high resolution value dense numeric" time type. I am also in favor of some int128s in the stdlib (for hashes, EDIT: linkified random numbers, and various other circumstances it can clean things up where a nimble dep is a burden -- need not been in system to do so).

However, I think Time/Duration is a bad example of int128 because: 64-bits is plenty. Since there is a history of inadequate time address space allocation (y2k, 2038, etc.), I will elaborate a bit.

64-bits gets you 2**63./86400e9/365.25 = 292.27 years of nanoseconds. That is 1970 +- 292 = year 1678.. year 2262 or "from Newton to Star Trek" with a Unix epoch. Meanwhile, if you had a bigger time range than +-292 years, it is virtually certain you no longer care about nanosec resolution. After all, either it refers to a time before we could measure so accurately or to after a time we will likely be able to measure much more accurately. No one is comparing system or file times etc. across 3 centuries wanting {-, <, <=} in exact arithmetic to nanoseconds.

Meanwhile, only 53 bits of an IEEE double also get you 2**53./86400e6/365.25 = +-285 years of microseconds. It is also virtually certain that you very likely want float64 arithmetic in "big range" contexts, and already convert to float64 anyway. This also gracefully loses precision as range increases and gains it a little as range decreases - e.g. 2021-1970 = 51 years => only ~0.2 usec error. (This may matter if you convert to float64 before subtracting two high-res times, for example, but honestly it is still the case that not a whole lot happens in a few hundred nanoseconds.)

Actual change to time type representation should be discussed over in https://github.com/nim-lang/RFCs/issues/383, though. I comment here because my points pertain more to the example.

JohnAD commented 2 years ago

Having 128-bit uint support would drastically improve the performance of any 128-bit decimal library that conforms to the IEEE standard as it does a lot of 113-bit arithmetic.

(This does remind me that I need to re-arrange my life somehow so that I can resume work on the decimal library.)

But whether it is me or someone else, it would be a huge performance boost on machines that happen to actually have CPU/APU/GPU 128-bit math support; which many modern ones do.

juancarlospaco commented 2 years ago

Stripe about int128, uint128: https://twitter.com/zenazn/status/1433134722809430019