unisonweb / base

Unison base libraries
https://share.unison-lang.org/@unison/base
18 stars 6 forks source link

15x faster implementation of `Text.words` #178

Closed pchiusano closed 1 year ago

pchiusano commented 1 year ago

The existing implementation doesn't use pattern API. Here's an implementation that does:

Text.words : Text -> [Text]
Text.words t = 
  c = not Class.whitespace
  word = join [capture (some (char c)), many space]
  match Pattern.run (join [many space, many word]) t with
    None -> []
    Some (words, rem) -> words

Hereby MIT licensed for inclusion in base.

It looks to be about 15x faster:

durable-data/knn> run main2
using list scan: 526.668µs
using  patterns: 34.443µs
main2 = do 
  msg = Text.repeat 10 " hello there my name is alice, I'm from the planet Zolton"
  printLine ("using list scan: " ++ Duration.toText (timeN 1000 '(lib.base.Text.words msg)))
  printLine ("using  patterns: " ++ Duration.toText (timeN 1000 '(Text.words msg)))
runarorama commented 1 year ago

Slightly modified:

Text.words : Text -> [Text]
Text.words t = 
  c = not Class.whitespace
  word = join [capture (some (char c)), many space]
  Pattern.captures (join [many space, many word]) t

Pushed to main. Thanks! 🌈⭐

pchiusano commented 1 year ago

nice! TIL about Pattern.captures.