guicho271828 / trivia

Pattern Matcher Compatible with Optima
Other
332 stars 22 forks source link

binary-search based keyword parsing #63

Closed ghost closed 7 years ago

ghost commented 7 years ago

I was playing with the binary-search parser for keywords; it looks like the overhead of dealing with the array store makes such an approach very slow compared to using property (by an order of magnitude). It seems really strange that that'd be true.

@guicho271828 is there something I'm doing wrong here ?

(in-package :trivia.level2.impl)

(defun compile-destructuring-pattern (ops &optional default)
  (match ops
    (nil default)
    ((list* (list :whole subpattern) rest)
     `(and ,subpattern ,(compile-destructuring-pattern rest)))
    ((list* (list* :atom subpatterns) rest)
     `(list* ,@subpatterns ,(compile-destructuring-pattern rest)))
    ((list* (list :optional) rest)
     (compile-destructuring-pattern rest))
    ((list* (list* :optional subpattern more-subpatterns) rest)
     (with-gensyms (lst supplied-p-default-sym)
       (destructuring-bind (subpattern &optional default (supplied-p-pattern supplied-p-default-sym supplied-p-pattern-supplied)) subpattern
         `(guard1 (,lst :type list) (listp ,lst)
                  (if ,lst (car ,lst) ,default) ,subpattern
                  ,@(when supplied-p-pattern-supplied
                      `((if ,lst t nil) ,supplied-p-pattern))
                  (cdr ,lst) ,(compile-destructuring-pattern `((:optional ,@more-subpatterns) ,@rest))))))
    ((list* (list :rest pattern) rest)
     `(and ,pattern ,(compile-destructuring-pattern rest '_)))
    ((list* (list* (and mode (or :keyword (list :keyword-allow-other-keys rem))) subpatterns) rest)
     ;; case 1,2 of the &key forms are already compiled into the 3rd form ; see parse-lambda-list
     `(and (type list)
           ;; sequentially accumulate keys
       ,(optimized-key-access (if (eq mode :keyword) nil (or rem '_)) subpatterns)
           ;; compile the rest
           ,(compile-destructuring-pattern rest '_)))
    ((list (list* :aux subpatterns))
     `(guard1 ,(gensym) t ,@(mapcan #'(lambda (x)
                                        (destructuring-bind (var &optional expr) (ensure-list x)
                                          (assert (typep var 'variable-symbol) nil "invalid lambda list")
                                          `(,expr ,var)))
                                    subpatterns)))))

(defun optimized-key-access (remainder-pattern subpatterns)
  ;; NOTE: uses a binary heap (instead) to achieve O(n lg n) speed using a single pass
  (let* ((props (compile-keyword-patterns subpatterns))
     (skeys (sort (mapcar #'second props) #'string<)))
    (with-gensyms (lst kargs indicator)
      `(guard1 ,lst t
           (let ((,kargs (make-array ,(length skeys) :element-type 'keyword)))
         (declare (type (simple-array t (*)) ,kargs))
         ;;,@(loop :for ii :below (length skeys) :collect `(setf (aref ,kargs ,ii) ',indicator))
         ,kargs) ,kargs
         ;;(kargs-parse nil #+nil',indicator ,lst #(,@skeys) ,kargs ,(if (eql remainder-pattern '_) nil t)) (list ,remainder-pattern nil)
        ,@(mapcan #'(lambda (x)
                 (destructuring-bind (key subpattern default &optional (supplied-p-pattern nil supplied-p-pattern-suppliedp)) (cdr x)
                   (let ((pos (position key skeys)))
                 `((if (eql (aref ,kargs ,pos) ',indicator) ,default (aref ,kargs ,pos)) ,subpattern
                   ,@(if supplied-p-pattern-suppliedp
                     `((if (eql (aref ,kargs ,pos) ',indicator) nil t) ,supplied-p-pattern))))))
             props)))))

(declaim (inline kargs-parse))
(defun kargs-parse (indicator lst heap kargs &optional collect &aux rest)
  (declare (type (simple-array keyword (*)) heap)
       (type (simple-array t (*)) kargs)
       (type symbol indicator)
       (optimize (speed 3) (safety 0)))
  (list
   (loop :for (k v . r) :on lst :by #'cddr
      :while (keywordp k)
      :for pos := (binary-search k 0 (length heap) heap)
      :if pos :do (if (eql (aref kargs pos) indicator) (setf (aref kargs pos) v))
      :else :if collect :collect k :and :collect v
      :do (setf rest r))
   rest))

(declaim (inline binary-search))
(defun binary-search (val lb ub vec)
  (declare (type fixnum lb ub)
       (type keyword val)
       (type (simple-array keyword (*)) vec))
  (cond
    ((or (= lb ub) (string< val (aref vec lb))) nil)
    ((string< (aref vec (1- ub)) val) nil)
    (t (loop :for j :of-type fixnum := (floor (+ lb ub) 2)
      :repeat #.(ceiling (log array-dimension-limit 2))
      :do (cond ((string= (aref vec j) val) (return j))
            ((>= lb (1- ub)) (return nil))
            (t (if (string< val (aref vec j))
               (setf ub j)
               (setf lb (1+ j)))))))))
guicho271828 commented 7 years ago

First I should know how did you benchmarked your code. Is it using a really long argument list? Is it short? With a short arglist, linear search is fast enough that the extra cost of binary search does not pay off. See article like this which still holds nowadays.

Next, string< could be very slow. In general symbols are treated specially and typically eq comparison between symbols are just pointer comparison.

Perhaps hashtable can be better. Again the same problem occurs. However, note that in this use case hashtable will be generated only once and never be modified, furthermore the size is expected to be small. Thus, having our own implementation of hash table using array may work.

ghost commented 7 years ago

Yes, but even using linear search on small arguments barely comes close to Trivia. Both SBCL's destructuring-bind and my linear-search impl. appear to be about an order of magnitude slower than lambda-list. No idea why.

binary-search can also be inlined since the vector size is already known at compile time.

guicho271828 commented 7 years ago

Oh,

       (let ((,kargs (make-array ,(length skeys) :element-type 'keyword)))

This causes creating a new array in runtime. It should be wrapped by a load-time-value, or the vector should be created in compile-time and be embedded in the code.

ghost commented 7 years ago

Ah, well that one is meant to be created at runtime (the :element-type part is a bug). Initializing (creation seems rather fast) the array itself appears to be slower than the current lambda-list implementation.

guicho271828 commented 7 years ago

Please cut the new branch and share the testing code on github . And then I can test it by myself .

ghost commented 7 years ago

Here you go,

https://github.com/akssri/trivia/blob/lambda/level2/derived3-fast.lisp

guicho271828 commented 7 years ago

by the way, lambda-list pattern is slower than destructring-bind on big-list in my environment, although this speedup is still valid. Trivia is faster on small lists and slow in long lists? sbcl-1.3.11, ubuntu 16.04.

guicho271828 commented 7 years ago
(let ((tmp 0)
      (lst '(1 2 :x 1 :b 2))
      (big-lst (list* 1 2 :x 1 :b 2 (loop :repeat 100 :collect (intern (symbol-name (gensym)) "KEYWORD")))))
  (time
   (dotimes (i 1000)
     (destructuring-bind (a0 a1 &key x b) lst
       (incf tmp))))
  (time
   (dotimes (i 1000)
     (match big-lst
       ((λlist a0 a1 &key x b)
        (incf tmp)))))
  (time
   (dotimes (i 1000)
     (match big-lst
       ((λlist-o a0 a1 &key x b &allow-other-keys)
        (incf tmp))))))

processor cycles:

d-bind lambda-list lambda-list-o
big-list repeat 100 187,080 473,232 45,273,906
big-list repeat 1 124,548 22,032 1,592,475
guicho271828 commented 7 years ago

I am confident that making a lookup table for the runtime object is a wrong direction. Rather, you should make a fixed lookup table representing the pattern (not the runtime object)

Scanning the list of length 100 is an unavoidable bottleneck as long as you are using a list as the runtime input. In fact, whatever structure the list is to be converted into, conversion itself requires O(n) in order to scan over the list. So the O(lg n) does not happen. <-- this statement is wrong, your goal is from O(n^2) to O(n lg n).

ghost commented 7 years ago

I'm not entirely sure what you mean here by a runtime object.

The lookup table created at compile time is only the sorted list of keywords, the one created created at runtime is a place for storing the keyword args as they are found (in arbitrary order). Is it this latter store that you mean ?

I'm fairly sure that using linear search instead of binary search one can achieve parity with destructuring-bind for big-list.

ghost commented 7 years ago

I doesn't appear to be feasible to optimize for the constants in O(n lg m) to work in all regimes. I tried eliminating the loop in the binary search by inlining the search for small arguments but that keeps it at ~10x slower for the above test-cases (not surprisingly, because of the overhead).

There is noticeable speed-up when the pattern itself has ~1000 odd keyword arguments where O(n m) starts to hurt, but this case is far too niche and barely applicable in practice.

(Closing this issue.)

guicho271828 commented 7 years ago

Assuming n is for big-list and m is for the number of keywords in the pattern. Above impl is trying to achieve O( n + m log n ) where the first term n for converting the list to heap and m log n for looking up the heap m times for m keywords, each taking log n.

I am instead suggesting that we can aim for O(n log m) for very long patterns. There is no runtime overhead for constructing the heap, since it is done in compile time.

guicho271828 commented 7 years ago

There is noticeable speed-up when the pattern itself has ~1000 odd keyword arguments where O(n m) starts to hurt, but this case is far too niche and barely applicable in practice.

Ok, you seem to test it too. hmm..

ghost commented 7 years ago

The above implementation is also doing O(n log m) - the given argument list is never copied. I'm still surprised that the overhead for this so high.

(Heap here was a terrible usage, since I'm really using binary search; sorry for the conflation - edited title.)