picrin-scheme / picrin

lightweight scheme interpreter
MIT License
414 stars 35 forks source link

number->string and string->number do not round-trip correctly #319

Closed dcurrie closed 8 years ago

dcurrie commented 8 years ago

One of the few dependencies of Picrin on libc is reading and writing floats. However, both the libc based version and the Picrin internal version of number->string and string->number are deficient. Here is an example:

(define (ftest number . radix)
    (if (null? radix) (set! radix 10))
    ;; this equivalence is required by R7RS...
    (eqv? number (string->number (number->string number radix) radix)))

> (ftest 1.1111111111111)
#f

Adding more digits to the output of number->string is a path to heartbreak, as is well known. See [Steele Jr. and White(1990)] G. L. Steele Jr. and J. L. White. How to print floating-point numbers accurately. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI 1994, pages 112–126, New York, NY, USA,

  1. ACM. ISBN 0-89791-364-7. doi: 10.1145/93542.93559.

The first solution to consider to this issue is to incorporate David Gay's dtoa.c into Picrin. This implementation is well maintained, and is used ubiquitously. The very rare bugs that are discovered are fixed quickly. The source code lives at http://www.netlib.org/fp/dtoa.c The license is liberal, but must be reproduced in the code and in all copies of the supporting documentation for software that incorporates it.

Note: dtoa.c has support for correct implementations for both number->string and string->number.

On OS X x86_64 dtoa.o is under 29 KB, and a complete executable with dtoa and a test wrapper with i/o and formatting code is under 34 KB. On this same platform Picrin is 384 KB, so we can expect use of dtoa to add under 10% to Picrin's size in the default configuration.

There are alternatives to David Gay's dtoa, and many of them are faster. None of them are as old or well exercised, so there is a small risk to using them. Milo Yip has benchmarks at https://github.com/miloyip/dtoa-benchmark Note that all of these libraries do number->string, but few provide string->number, which is also challenging to get right.

I have converted Milo Yip's C++ version of milo_dtoa to C; it has very good performance, and passes all the tests at dtoa-benchmark. I have also created a strtod replacement I call emyg_atod that has decent performance (better than the strtod() in David Gay's dtoa), and passes the dtoa-benchmark round-trip test, meeting the Scheme requirement (see ftest above). It is based on a paper by Aubrey Jaffer, January 2015, "Easy Accurate Reading and Writing of Floating-Point Numbers" http://arxiv.org/abs/1310.8121v6 You can see my implementations at https://github.com/dcurrie/dtoa-benchmark/tree/emyg_atod

In Picrin, an integer is 32 bits. [This is true in PIC_NAN_BOXING mode, where 48 bits is feasible; in PIC_WORD_BOXING mode the int could be 62 bits, however the inline function pic_valid_int() limits integer values to the range INT_MIN to INT_MAX; otherise pic_value uses type int for integers.] Consequently, any integer value can be represented exactly as a double, and there would be no loss of precision to use (an error free) strtod() to read in both integers and doubles. This would simplify read_unsigned(): it would just try to read a double with (an error free) strtod(); on success, scan the string for '.' and 'e' to see if an integer is wanted; if so, check the double and convert to int if it is in range.

This would also address the present reader bug:

2147483648 -2147483648

I plan to experiment with incorporating emyg_dtoa and emyg_atod into picrin.

KeenS commented 8 years ago

IMO, as far as picrin is concerned, it is a good option having accurate functions.

@wasabiz How do you like considering benz?

dcurrie commented 8 years ago

Whether it belongs in benz or picrin is a good question. Perhaps we can architect it so there is a clean plugin interface for number->string and string->number.