s-expressionists / Eclector

A portable Common Lisp reader that is highly customizable, can recover from errors and can return concrete syntax trees
https://s-expressionists.github.io/Eclector/
BSD 2-Clause "Simplified" License
110 stars 9 forks source link

Handle reading of floating point numbers better #68

Open scymtym opened 4 years ago

scymtym commented 4 years ago

There are at least two things to improve here:

Bike commented 4 years ago

Aubrey Jaffer wrote a brief article on this: https://arxiv.org/abs/1310.8121v1

The code in "Reading" is in Scheme, but should be easily adaptable. It's only a few more lines than what Eclector has now. The round-quotient function described is probably doable as cl:round plus some extra adaptation.

If I understand correctly, this algorithm is less efficient than Clinger's, as it uses more and bigger intermediate bignums. But it might be an improvement over the current procedure.

scymtym commented 4 years ago

Thank you.

Bike commented 3 years ago

upon further examination, no adaptation of round is actually needed. So you can pretty much use the code directly, substituting round for round-quotient, and float for exact->inexact and such. Plus you can use ash directly. Overall, it's

(defconstant +double-mantissa-digits+ 53)

(defun mantexp->dbl (mantissa exponent &key (base 10))
  (declare (type integer mantissa exponent))
  (if (minusp exponent)
      (let* ((scale (expt base (- exponent)))
             (bex (- (integer-length mantissa)
                     (integer-length scale)
                     +double-mantissa-digits+))
             (numerator (ash mantissa (- bex)))
             (quotient (round numerator scale)))
        (when (> (integer-length quotient) +double-mantissa-digits+)
          ;; too many bits of quotient
          (setf bex (1+ bex)
                quotient (round numerator (* scale 2))))
        (scale-float (float quotient 1d0) bex))
      (float (* mantissa (expt base exponent)) 1d0)))

Based on a quick test where I generate a million random pairs of mantissa and exponent, this results in exactly the same numbers that SBCL reads, so I think it works?

scymtym commented 3 years ago

Looking into the Jaffer article. Quite the version history there. Conversion from HTML to PDF, then rewrite in Java.

scymtym commented 3 years ago

With extensions for

I ended up with the following:

;;; Based on "v1" version of Aubrey Jaffer (2018): Easy Accurate
;;; Reading and Writing of Floating-Point Numbers. Initial conversion
;;; to Common Lisp by Alex Wood.
(defun mantissa-and-exponent-to-float
    (sign mantissa decimal-exponent exponent-sign exponent float-type)
  (declare (type (member -1 1) sign)
           (type integer mantissa)
           (type (integer 0) decimal-exponent)
           (type (or null integer) exponent)
           (type (member -1 1) exponent-sign)
           (type (member single-float double-float float-type)))
  (macrolet
      ((convert (type)
         (let* ((prototype (ecase type
                             (single-float 0f0)
                             (double-float 0d0)))
                (precision (float-precision (ecase type
                                              (single-float 1f0)
                                              (double-float 1d0)))))
           `(labels ((negative-exponent (mantissa exponent)
                       (let* ((scale (expt 10 exponent))
                              (bex (- (integer-length mantissa)
                                      (integer-length scale)
                                      ,precision))
                              (numerator (ash mantissa (- bex)))
                              (quotient (round numerator scale)))
                         (when (> (integer-length quotient) ,precision)
                           (setf bex (1+ bex)
                                 quotient (round numerator (* scale 2))))
                         (scale-float (float quotient ,prototype) bex)))
                     (positive-exponent (mantissa exponent)
                       (float (* mantissa (expt 10 exponent)) ,prototype))
                     (uncertain-exponent (mantissa exponent)
                       (if (minusp exponent)
                           (negative-exponent mantissa (- exponent))
                           (positive-exponent mantissa exponent))))
              (cond ((eql mantissa 0) ; (negative) zero
                     (if (eql sign -1)
                         ,(- prototype)
                         ,prototype))
                    ((and (null exponent) (zerop decimal-exponent))
                     (break "cannot happen"))
                    ((null exponent)
                     (negative-exponent (* sign mantissa) decimal-exponent))
                    ((zerop decimal-exponent)
                     (if (eql exponent-sign 1)
                         (positive-exponent (* sign mantissa) exponent)
                         (negative-exponent (* sign mantissa) exponent)))
                    (t ; both, "decimal exponent" and "scientific exponent"
                     (uncertain-exponent
                      (* sign mantissa) (- (* exponent-sign exponent)
                                           decimal-exponent))))))))
    (ecase float-type
      (single-float (convert single-float))
      (double-float (convert double-float)))))
scymtym commented 2 months ago

I think this is where left off:

;;; Taken from SBCL code
;;; Truncate EXPONENT if it's too large for a float.
(defun truncate-exponent (exponent mantissa decimal-exponent)
  ;; Work with base-2 logarithms to avoid conversions to floats, and
  ;; convert to base-10 conservatively at the end.  Use the least
  ;; positive float, because denormalized exponent can be larger than
  ;; normalized.
  (let* ((max-exponent          (+ sb-vm:double-float-digits
                                   sb-vm:double-float-bias))
         (mantissa-bits         (integer-length mantissa))
         (decimal-exponent-bits (1- (integer-length decimal-exponent)))
         (magnitude             (- mantissa-bits decimal-exponent-bits)))
    (if (minusp exponent)
        (max exponent (ceiling (- (+ max-exponent magnitude))
                               #.(cl:floor (cl:log 10 2))))
        (min exponent (floor (- max-exponent magnitude)
                             #.(cl:floor (cl:log 10 2)))))))

;;; Based on "v1" version of Aubrey Jaffer (2018): Easy Accurate
;;; Reading and Writing of Floating-Point Numbers. Initial conversion
;;; to Common Lisp by Alex Wood.
(defun mantissa-and-exponent-to-float
    (sign mantissa decimal-exponent exponent-sign exponent float-type)
  (declare (type (member -1 1) sign)
           (type integer mantissa)
           (type (integer 0) decimal-exponent)
           (type (or null integer) exponent)
           (type (member -1 1) exponent-sign)
           (type (member single-float double-float) float-type)
           (optimize speed))
  (macrolet
      ((convert (type)
         (let* ((prototype (ecase type
                             (single-float 0f0)
                             (double-float 0d0)))
                (precision (float-precision (ecase type
                                              (single-float 1f0)
                                              (double-float 1d0)))))
           `(labels ((negative-exponent (mantissa exponent)
                       (let* ((exponent (truncate-exponent
                                         exponent mantissa decimal-exponent))
                              (scale (expt 10 exponent))
                              (bex (- (integer-length mantissa)
                                      (integer-length scale)
                                      ,precision))
                              (numerator (ash mantissa (- bex)))
                              (quotient (round numerator scale)))
                         (when (> (integer-length quotient) ,precision)
                           (setf bex (1+ bex)
                                 quotient (round numerator (* scale 2))))
                         (scale-float (float quotient ,prototype) bex)))
                     (positive-exponent (mantissa exponent)
                       (let ((exponent (truncate-exponent
                                        exponent mantissa decimal-exponent)))
                         (float (* mantissa (expt 10 exponent)) ,prototype)))
                     (uncertain-exponent (mantissa exponent)
                       (if (minusp exponent)
                           (negative-exponent mantissa (- exponent))
                           (positive-exponent mantissa exponent))))
              (cond ((eql mantissa 0)   ; (negative) zero
                     (if (eql sign -1)
                         ,(- prototype)
                         ,prototype))
                    ((null exponent)
                     (negative-exponent (* sign mantissa) decimal-exponent))
                    ((zerop decimal-exponent)
                     (if (eql exponent-sign 1)
                         (positive-exponent (* sign mantissa) exponent)
                         (negative-exponent (* sign mantissa) exponent)))
                    (t ; both, "decimal exponent" and "scientific exponent"
                     (uncertain-exponent
                      (* sign mantissa) (- (* exponent-sign exponent)
                                           decimal-exponent))))))))
    (ecase float-type
      (single-float (convert single-float))
      (double-float (convert double-float)))))
yitzchak commented 1 month ago

Is the decimal-exponent referenced from the start of the mantissa or the end? Its ambiguous in your example for 12.34. For 12.345 is it 2 or 3?

From a conversion perspective 3 would probably be better because that just represents 10^(-3), whereas if you reference it from the start as 2, then the converter will probably need to know how many digits were in the string representation.

yitzchak commented 1 month ago

Provided that the decimal-exponent is referenced from the end of the mantissa, then mantissa-and-exponent-to-float could be implemented via Quaviver:

(defun mantissa-and-exponent-to-float
    (sign mantissa decimal-exponent exponent-sign exponent float-type)
  (quaviver:integer-float (make-instance 'quaviver/liebler:client)
                          float-type
                          10
                          mantissa
                          (- (* exponent-sign (or exponent 0))
                             decimal-exponent)
                          sign))

Right now we have only one base 10 to base 2 client, quaviver/liebler. There is no state in the client instance so you can just reuse the instance across multiple mantissa-and-exponent-to-float calls.

scymtym commented 1 month ago

It is from the end like you suspected so everything should work out.

Reusing a single client instance would be a prerequisite since otherwise the consing and object initialization would possibly dominate.

I will probably compare the quaviver version against the Jaffer/SBCL code above at some point. Compared to the completely naive implementation that Eclector currently uses, the Jaffer-based code achieved around 20 - 30 % improvements in both run time and consing.

yitzchak commented 1 month ago

You could also use the Eclector client and add the Quaviver as a mixin.

scymtym commented 1 month ago

You could also use the Eclector client and add the Quaviver as a mixin.

That's a good idea in principle but Eclector does not require client classes to have anything at all in their superclass list. So existing clients would have to be adjusted to add the appropriate mixin (or newly exposed superclass).

One idea could be to have something like (floating-point-behavior eclector-client) in order to retrieve an appropriate quaviver client object. This style was used in the Lisp Interface Library: interfaces could contain "sub-interfaces" for parametrizing individual aspects. I will have to think about possible solutions some more.

yitzchak commented 1 month ago

Makes sense.

yitzchak commented 1 month ago

I've added an implementation of Jaffer v7 to Quaviver. It hasn't been optimized or tested yet. It is doing bignum expt-5 right now which could be replaced with some table lookups for an easy speedup.