haskell / text

Haskell library for space- and time-efficient operations over Unicode text.
http://hackage.haskell.org/package/text
BSD 2-Clause "Simplified" License
408 stars 159 forks source link

Implement `Data.Text.unpack` and `Data.Text.toTitle` directly, without streaming #611

Closed Bodigrim closed 1 month ago

Bodigrim commented 3 months ago

Related to our discussion with @rhendric at https://discourse.haskell.org/t/the-quest-to-completely-eradicate-string-awkwardness/10111/75?u=bodigrim, let's implement Data.Text.unpack directly via iterArray instead of streaming / unstreaming. My immediate goal is to simplify dependencies between modules of text, but I imagine that performance could get better as well.

rhendric commented 3 months ago

I might be benchmarking this incorrectly, but I think it has a negative impact on performance.

Main.hs ```hs module Main (main) where import Criterion.Main import Data.Text.Internal (Text(..)) import Data.Text.Unsafe (Iter(..), iterArray) import Data.Text qualified as T unpack :: Text -> String unpack (Text arr off len) = go 0 where go !i | i >= len = [] | otherwise = c : go (i + l) where Iter c l = iterArray arr (off+i) {-# INLINE [1] unpack #-} f1 :: String -> String f1 = T.unpack . T.toTitle . T.pack {-# NOINLINE f1 #-} f2 :: String -> String f2 = unpack . T.toTitle . T.pack {-# NOINLINE f2 #-} testInput :: String testInput = "this is a pretty short string, all things considered." testInputT :: Text testInputT = T.pack testInput {-# NOINLINE testInputT #-} main :: IO () main = defaultMain [ bgroup "fusion" [ bench "old" $ nf f1 testInput , bench "new" $ nf f2 testInput ] , bgroup "just unpack" [ bench "old" $ nf T.unpack testInputT , bench "new" $ nf unpack testInputT ] ] ```
benchmarking fusion/old
time                 1.148 μs   (1.144 μs .. 1.154 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.145 μs   (1.141 μs .. 1.151 μs)
std dev              16.06 ns   (13.55 ns .. 19.22 ns)
variance introduced by outliers: 13% (moderately inflated)

benchmarking fusion/new
time                 1.914 μs   (1.909 μs .. 1.920 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.911 μs   (1.906 μs .. 1.917 μs)
std dev              19.80 ns   (15.78 ns .. 28.69 ns)

benchmarking just unpack/old
time                 474.1 ns   (467.0 ns .. 481.2 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 470.3 ns   (467.4 ns .. 476.1 ns)
std dev              12.96 ns   (7.541 ns .. 20.76 ns)
variance introduced by outliers: 39% (moderately inflated)

benchmarking just unpack/new
time                 943.0 ns   (935.2 ns .. 956.6 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 964.7 ns   (956.4 ns .. 972.0 ns)
std dev              26.70 ns   (23.38 ns .. 31.77 ns)
variance introduced by outliers: 37% (moderately inflated)
Bodigrim commented 1 month ago

Thanks @rhendric, should be fixed now.

rhendric commented 1 month ago

My comment wasn't about toTitle specifically, but about any function with a stream-based implementation. Modify f1 and f2 to use any other function implemented via unstream, and there's still a performance regression. Example:

f1 :: String -> String
f1 = T.unpack . T.scanl max '\0' . T.pack
{-# NOINLINE f1 #-}

f2 :: String -> String
f2 = unpack . T.scanl max '\0' . T.pack
{-# NOINLINE f2 #-}
benchmarking fusion/old
time                 836.4 ns   (832.7 ns .. 841.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 833.7 ns   (831.3 ns .. 837.1 ns)
std dev              9.667 ns   (6.892 ns .. 14.57 ns)

benchmarking fusion/new
time                 1.085 μs   (1.081 μs .. 1.092 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.094 μs   (1.087 μs .. 1.104 μs)
std dev              27.77 ns   (20.48 ns .. 45.82 ns)
variance introduced by outliers: 33% (moderately inflated)
Bodigrim commented 1 month ago

@rhendric we are gradually rewriting all stream-based functions to their non-streaming counterparts. E. g., see map and mapAccumL. Unless you have a particularly long pipeline, non-streaming implementation is faster. If you happen to have a long transformation pipeline (to be honest, I know exactly zero use cases for Data.Text.scanl), you are welcome to use Stream explicitly. See https://github.com/haskell/text/issues/513 for some context.

rhendric commented 1 month ago

Ah, okay then!