spl / dlist

Difference lists in Haskell
https://hackage.haskell.org/package/dlist
BSD 3-Clause "New" or "Revised" License
65 stars 15 forks source link

Documented time bounds are incorrect for persistent use #104

Open treeowl opened 2 years ago

treeowl commented 2 years ago

The usual assumption for pure data structures in Haskell is that the documented time bounds may be amortized, but that amortization must hold up under persistent use. That is not true for dlist. snoc and head are both documented as being O(1), and yet the following program runs far longer than I would ever want to wait. Swapping in Data.Sequence for DList lets this run quickly.

import qualified Data.DList as D
import Data.DList (DList)
import Data.Foldable

blob :: DList Int
blob = foldl' D.snoc mempty [1..10^7]

main :: IO ()
main = print $ sum [ D.head (blob `D.snoc` a) | a <- [1..10^6 :: Int]]

I think the best approach is likely to document that DLists are intended to be used in an affine/linear fashion, and that the bounds are only valid in that context.

treeowl commented 2 years ago

There's a lot of relevant discussion in Reflection Without Remorse: Revealing a hidden sequence to speed up monadic reflection, though the paper goes in a different direction.

treeowl commented 2 years ago

Hrm... affine use is putting too strong a point on it. It's perfectly fine to concat . replicate n a DList. But they should generally only be "inspected" once.

Anton-Latukha commented 2 years ago

Yep.

To unwrap example:

(also decreased magnitude one time, so it is faster/easier to play & get code responses)

import qualified Data.DList as D
import Data.DList (DList)
import qualified Data.Sequence as S
import Data.Sequence (Seq)
import Data.Foldable

-- DList
blobD :: DList Int
blobD = foldl1 (D.snoc) mempty [1..10^6]

mainD :: IO ()
mainD = print $ sum [ D.head (D.snoc blobD a) | a <- [1..10^5 :: Int]]

-- Seq
blobS :: Seq Int
blobS = foldl1 (S.|>) mempty [1..10^6]

mainS :: IO ()
mainS = print $ fmap sum [ S.take 1 (blobS S.|> a) | a <- [1..10^5 :: Int]]

The First would do not terminate in meaningful time, but the second terminates fast.

foldr would be the same.