haskell-unordered-containers / unordered-containers

Efficient hashing-based container types
BSD 3-Clause "New" or "Revised" License
221 stars 97 forks source link

Towards unlifted HashMaps #463

Open sjakobi opened 2 years ago

sjakobi commented 2 years ago

This compiles with GHC-9.4-alpha1:

{-# language UnliftedDatatypes, MagicHash #-}

module M where

import Data.Kind (Type)
import GHC.Exts

data Leaf k v :: UnliftedType where
  L :: !k -> v -> Leaf k v

type HashMap' :: Type -> Type -> UnliftedType
data HashMap' k v
    = Empty
    | BitmapIndexed !Word !(SmallArray# (HashMap' k v))
    | Leaf !Word !(Leaf k v)
    | Full !(SmallArray# (HashMap' k v))
    | Collision !Word !(SmallArray# (Leaf k v))

data HashMap k v = HM !(HashMap' k v)

If the ergonomics work out, maybe we could start using a similar representation around GHC 8.8.

The main benefit should be better performance due to the removal of heap checks.

treeowl commented 2 years ago

The only likely performance trouble will be with lazy maps and traversals. Otherwise, I think it should reduce code size a bit, which is good, and maybe take stress off branch prediction.

sjakobi commented 2 years ago

The only likely performance trouble will be with lazy maps and traversals.

It would be great if you could give some concrete examples where you're expecting trouble!

konsumlamm commented 2 years ago
data HashMap k v = HM !(HashMap' k v)

As far as I can tell, this would introduce an additional indirection for HashMap (a pointer to the HashMap + a pointer to the HashMap' vs just a pointer to the HashMap). I'm not sure if this would noticably impact performance, but it's something to keep in mind.

treeowl commented 2 years ago

@konsumlamm ohh yeah. That is going to be a problem. We really need to get the GHC folks to give us a way to have free wrappers. This might kill the whole idea.

sjakobi commented 2 years ago

I have opened https://gitlab.haskell.org/ghc/ghc/-/issues/21617 to request "free lifted wrappers".

sjakobi commented 2 years ago

If we end up accepting the pointer indirection, we could consider splitting out the Empty constructor:

type Tree :: Type -> Type -> UnliftedType
data Tree k v
    = BitmapIndexed !Word !(SmallArray# (HashMap' k v))
    | Leaf !Word !(Leaf k v)
    | Full !(SmallArray# (HashMap' k v))
    | Collision !Word !(SmallArray# (Leaf k v))

data HashMap k v = Empty | Tree !(Tree k v)

(Previous discussion: https://github.com/haskell-unordered-containers/unordered-containers/issues/161#issuecomment-311246147)

AndreasPK commented 2 years ago

I'm fairly confident that https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8445 will work out.

It will allow one to write data HashMap k v = HM !(HashMap' k v) with GHC eliminating the HM wrapper constructor at runtime.

But it won't happen before 9.6

AndreasPK commented 1 year ago

Out of interest has anyone done further work on this? Or run benchmarks in regards to the benefits of this approach?

treeowl commented 1 year ago

Not that I know of.