float to string to float roundtrip should be possible for every float value

nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).

https://nim-lang.org

Other

16.55k stars 1.47k forks source link

float to string to float roundtrip should be possible for every float value #7717

Closed skilchen closed 3 years ago

skilchen commented 6 years ago

proc test(f: float): bool =
  f == parseFloat($f)

should always return true, but the following test gives false for all example values on the c backend (nim c -r test) and for some on the js backend (nim js -r test) but javascript is actually able to deal correctly with the float -> string -> float roundtrip (nim js -d:nodejs -r test):

#test.nim
import strutils
import math
import fenv

when not defined(nodejs):
  proc test(f: float): bool =
    echo f, " ", $f, " ", parseFloat($f)
    result = f == parseFloat($f)
else:
  proc test(f: float): bool = 
    var a: cstring
    var f1: float
    asm """
      `a` = `f` + "";
      let b = parseFloat(`a`);
      `f1` = b;
      `result` = b === `f`
    """
    echo f, " ", $a, " ", f1

echo test(1.0 + epsilon(float64))
echo test(1000000.0000000123)
echo test(log2(100000.0))
echo test(maximumPositiveValue(float32))
echo test(maximumPositiveValue(float64))
echo test(minimumPositiveValue(float32))
echo test(minimumPositiveValue(float64))

I tried to wrap the famous dtoa.c by David Gay, but c2nim just drives me crazy. Hopefully somebody with better wrapping abilities, will try to add dtoa and strtod to Nim. And if someone has some time to spare, a replacement of the buggy nimParseBiggestFloat in lib/system/jssys.nim with a native javascript version would also be very welcome.

Araq commented 6 years ago

should always return true

No. Where do we claim that? $ returns some string representation of the float and is not lossless. Use formatFloat to control the string representation.

skilchen commented 6 years ago

You can't use formatFloat because you don't know how many digits are needed to do a lossless roundtrip from float to string to float. You need help from the language for this and there are several languages having this feature.

Araq commented 6 years ago

proc formatFloat*(f: float, format: FloatFormatMode = ffDefault,
                  precision: range[-1..32] = 16; decimalSep = '.'): string {.
                  noSideEffect, rtl, extern: "nsu$1".}

--> max precision ever required is 32.

skilchen commented 6 years ago

I was hoping to raise your interest in the matter of a lossless float->string->float roundtrip. I evidently failed completely... If you should change your opinion, here you would find an easy entry point to the relevant discussions and available implementations: Printing Floating-Point Numbers The above seems to be unaware of this relevant development http://cseweb.ucsd.edu/~lerner/papers/fp-printing-popl16.pdf

Araq commented 6 years ago

I was hoping to raise your interest in the matter of a lossless float->string->float roundtrip. I evidently failed completely...

No, not at all, you reported a bug where there is none and I closed it. Lossless roundtrips are important and we might need some better support for it, but $ is not the only way to get it. Or maybe it is, but that is an RFC then, not a bug.

bluenote10 commented 4 years ago

I'd consider this a bug as well. In many other ecosystems, the standard string representation of a float is not just "some string representation", but one that allows to get back the literal used in the code, which makes a lot of sense.

What adds to the confusion in Nim is that currently the string representation looks like it might be full precision, but in fact seems to just mess up the least significant digit.

This can be very confusing:

let x = 117.63331139400016
let y = 117.63331139400017
echo x, " < ", y, " = ", x < y

Output:

117.6333113940002 < 117.6333113940002 = true

disruptek commented 4 years ago

It may be confusing, but it's fundamentally a problem of how floats work.

bluenote10 commented 4 years ago

@disruptek I can't really see the fundamental problem. The number 117.63331139400016 has an unambiguous string representation, it just isn't 117.6333113940002. Also, many languages achieve lossless to-float + parse-float roundtrips, so it is not an unusual expectation.

Araq commented 4 years ago

@bluenote10 Since then we changed $ to be loss-less, apparently we failed. :-)

disruptek commented 4 years ago

Maybe it's time for ryu.

timotheecour commented 4 years ago

3 options:

option 1: if we can make ryu as performant as possible and not adding too many dependencies (or causing cyclic dependencies or lots of complications), then this could be added to some std/ryu that would be imported by system, replacing existing $float
option 2: if that's too hard, this could be possible via dynamic library loading of a shared library libryu from a std/ryu.nim (there's a precedent for that anyways with libpcre), possibly controlled via a flag
option 3: else, only possiblity is to defer that to a module not implicitly imported by nim, ie the suggestion I made as part of https://github.com/nim-lang/RFCs/issues/203#issuecomment-602534906

floats print using ryu (probably too complex for default inclusion in system) proposal part 1: keep system.$ simple proposal part 2: add std/prettyprint module

so std/ryu would be instead imported by std/prettyprint

I suggest starting with option 3 since that's actually needed for either option 1 or option 2

disruptek commented 4 years ago

I believe the current blocker is that @Araq wants minimal growth of binaries due to the tables of constants that Ryu uses. I think the smaller tables would work for the purposes of default Ryu support/inclusion, or perhaps we could determine whether Ryu is actually used at compile-time and omit the constants otherwise.

This should be something we get "for free" with my IC branch, because it has crude (but effective) tree-shaking. It's hardly tested, but I'm optimistic, anyway...

timotheecour commented 4 years ago

I believe the current blocker is that @Araq wants minimal growth of binaries due to the tables of constants that Ryu use

that's a (potential) blocker for option 1 but not for option 2 nor option 3 though; and option 1 kind of needs 1 or 2 to happen first anyways; so the idea would be that default $ keeps working as today, but users have a stdlib option to print floats via ryu via import std/ryu; whether it exports $ can be decided later (at least should export proc ryu*(result: var string, a: float)

Araq commented 4 years ago

I'm not blocking anything, I'm happy to take what you have and maybe put it under a when not defined(embedded) or similar.

timotheecour commented 4 years ago

@disruptek

actually we can nicely solve the dependency issue as follows, so that ryu will work for c,js,vm,nimscript, and be usable from dollars.nim without causing any cyclic dependency issues:

# in dollars.nim:
proc ryu*(result: var string, a: float) {.magic: "Ryu".}
proc `$`*(a: float): string = ryu(result, a)

# in vmops.nim:
register: dollars.ryu

# in ryu.nim
# here you can add more heavy dependencies as needed
proc ryuImpl(result: var string a: float) {.compilerproc.} = ...

# in cgen:
mRyu dispatches to `ryuImpl`

# in jsgen:
ditto

we could determine whether Ryu is actually used at compile-time

the mechanism is already in place in the compiler, no need to add any custom logic; ryuImpl would only be cgen'd if used