Closed ghost closed 8 years ago
First I should know how did you benchmarked your code. Is it using a really long argument list? Is it short? With a short arglist, linear search is fast enough that the extra cost of binary search does not pay off. See article like this which still holds nowadays.
Next, string<
could be very slow. In general symbols are treated specially and typically eq
comparison between symbols are just pointer comparison.
Perhaps hashtable can be better. Again the same problem occurs. However, note that in this use case hashtable will be generated only once and never be modified, furthermore the size is expected to be small. Thus, having our own implementation of hash table using array may work.
Yes, but even using linear search on small arguments barely comes close to Trivia. Both SBCL's destructuring-bind
and my linear-search impl. appear to be about an order of magnitude slower than lambda-list
. No idea why.
binary-search
can also be inlined since the vector size is already known at compile time.
Oh,
(let ((,kargs (make-array ,(length skeys) :element-type 'keyword)))
This causes creating a new array in runtime. It should be wrapped by a load-time-value, or the vector should be created in compile-time and be embedded in the code.
Ah, well that one is meant to be created at runtime (the :element-type
part is a bug). Initializing (creation seems rather fast) the array itself appears to be slower than the current lambda-list
implementation.
Please cut the new branch and share the testing code on github . And then I can test it by myself .
by the way, lambda-list pattern is slower than destructring-bind on big-list in my environment, although this speedup is still valid. Trivia is faster on small lists and slow in long lists? sbcl-1.3.11, ubuntu 16.04.
(let ((tmp 0)
(lst '(1 2 :x 1 :b 2))
(big-lst (list* 1 2 :x 1 :b 2 (loop :repeat 100 :collect (intern (symbol-name (gensym)) "KEYWORD")))))
(time
(dotimes (i 1000)
(destructuring-bind (a0 a1 &key x b) lst
(incf tmp))))
(time
(dotimes (i 1000)
(match big-lst
((λlist a0 a1 &key x b)
(incf tmp)))))
(time
(dotimes (i 1000)
(match big-lst
((λlist-o a0 a1 &key x b &allow-other-keys)
(incf tmp))))))
processor cycles:
d-bind | lambda-list | lambda-list-o | |
---|---|---|---|
big-list repeat 100 | 187,080 | 473,232 | 45,273,906 |
big-list repeat 1 | 124,548 | 22,032 | 1,592,475 |
I am confident that making a lookup table for the runtime object is a wrong direction. Rather, you should make a fixed lookup table representing the pattern (not the runtime object)
Scanning the list of length 100 is an unavoidable bottleneck as long as you are using a list as the runtime input. In fact, whatever structure the list is to be converted into, conversion itself requires O(n) in order to scan over the list. So the O(lg n) does not happen. <-- this statement is wrong, your goal is from O(n^2) to O(n lg n).
I'm not entirely sure what you mean here by a runtime object.
The lookup table created at compile time is only the sorted list of keywords, the one created created at runtime is a place for storing the keyword args as they are found (in arbitrary order). Is it this latter store that you mean ?
I'm fairly sure that using linear search
instead of binary search
one can achieve parity with destructuring-bind
for big-list
.
I doesn't appear to be feasible to optimize for the constants in O(n lg m)
to work in all regimes. I tried eliminating the loop in the binary search by inlining the search for small arguments but that keeps it at ~10x slower for the above test-cases (not surprisingly, because of the overhead).
There is noticeable speed-up when the pattern itself has ~1000 odd keyword arguments where O(n m)
starts to hurt, but this case is far too niche and barely applicable in practice.
(Closing this issue.)
Assuming n
is for big-list and m
is for the number of keywords in the pattern.
Above impl is trying to achieve O( n + m log n )
where the first term n
for converting the list to heap and m log n
for looking up the heap m
times for m
keywords, each taking log n
.
I am instead suggesting that we can aim for O(n log m)
for very long patterns. There is no runtime overhead for constructing the heap, since it is done in compile time.
There is noticeable speed-up when the pattern itself has ~1000 odd keyword arguments where O(n m) starts to hurt, but this case is far too niche and barely applicable in practice.
Ok, you seem to test it too. hmm..
The above implementation is also doing O(n log m)
- the given argument list is never copied. I'm still surprised that the overhead for this so high.
(Heap here was a terrible usage, since I'm really using binary search; sorry for the conflation - edited title.)
I was playing with the binary-search parser for keywords; it looks like the overhead of dealing with the array store makes such an approach very slow compared to using property (by an order of magnitude). It seems really strange that that'd be true.
@guicho271828 is there something I'm doing wrong here ?