Closed lopopolo closed 1 year ago
I guess break_chars_byteset
can be evaluated once / statically.
Something like https://github.com/sqlite/sqlite/blob/master/src/tokenize.c#L61-L80 but for DEFAULT_BREAK_CHARS
and DOUBLE_QUOTES_SPECIAL_CHARS
.
@gwenn I think so too but I wasn't sure if that change would be accepted since these functions are public APIs that take the byteset as a slice.
I'm not sure how you'd like to make the API breaks.
Do you want to merge this as is or maybe push a PR to my fork?
See #676
Thanks @gwenn!
Several places use
memchr
with a byteset. This commit refactors these code paths to construct a lookup table fromu8
->bool
, where an index is set totrue
if the byte is present in the given slice.This change removes linear scans that occur in loops, which changes these functions runtime complexity from
O(m * n)
toO(m + n)
.