haskell-unordered-containers / unordered-containers

Efficient hashing-based container types
BSD 3-Clause "New" or "Revised" License
221 stars 97 forks source link

Internal arrays and strictness #311

Open infinity0 opened 3 years ago

infinity0 commented 3 years ago

In haskellari/strict#28 (discussion in haskellari/strict#22) I am experimenting with a safer version of strict HashMap, where instances are also strict, by adding ! to the data type definitions. So

data Leaf k v = L !k v

becomes

data Leaf k v = L !k !v

As the HashMap type is mostly already strictly-annotated:

data HashMap k v
    = Empty
    | BitmapIndexed !Bitmap !(A.Array (HashMap k v))
    | Leaf !Hash !(Leaf k v)
    | Full !(A.Array (HashMap k v))
    | Collision !Hash !(A.Array (Leaf k v))

together this gets us 98% of the way there towards having a fully-strict HashMap that is impossible to accidentally introduce thunks into.

However, one thing remains that could contain thunks, namely these A.Array containers, which is an alias to the primitive Array# or SmallArray#. As far as I can tell, these are lazy containers, and the HashMap implementation currently has to force things before adding them to these arrays, e.g. 9d3a297/Data/HashMap/Internal.hs#L797. However there are a few cases where this is missing, e.g. https://github.com/infinity0/unordered-containers/commit/4415cb6 and it is unclear why these are not being forced - whether on purpose or by oversight.

Logically speaking, it seems that these arrays ought only to contain forced values and never thunks, even for HashMap.Lazy - is that correct?

If so, perhaps we should simply add bang annotations to the insert functions themselves in Data.HashMap.Internal.Array? For example in https://github.com/infinity0/unordered-containers/commit/e2756b124cc1bddb943666ae3ee6fa8e2cb1749e - this is done mechanically by grepping for all arguments of type a -> and adding a !, one does not have to think hard about whether it's correct, and is impossible to accidentally misuse.

treeowl commented 3 years ago

I'm reluctant to force always for lazy HashMaps. Think about fmap.

On Tue, Apr 13, 2021, 3:22 PM Ximin Luo @.***> wrote:

In haskellari/strict#28 https://github.com/haskellari/strict/pull/28 (discussion in haskellari/strict#22 https://github.com/haskellari/strict/issues/22) I am experimenting with a safer version of strict HashMap, where instances are also strict, by adding ! to the data type definitions. So

data Leaf k v = L !k v

becomes

data Leaf k v = L !k !v

As the HashMap type is mostly already strictly-annotated:

data HashMap k v = Empty | BitmapIndexed !Bitmap !(A.Array (HashMap k v)) | Leaf !Hash !(Leaf k v) | Full !(A.Array (HashMap k v)) | Collision !Hash !(A.Array (Leaf k v))

together this gets us 98% of the way there towards having a fully-strict HashMap that is impossible to accidentally introduce thunks into.

However, one thing remains that could contain thunks, namely these A.Array containers, which is an alias to the primitive Array# or SmallArray#. As far as I can tell, these are lazy containers, and the HashMap implementation currently has to force things before adding them to these arrays, e.g. 9d3a297/Data/HashMap/Internal.hs#L797 https://github.com/haskell-unordered-containers/unordered-containers/blob/9d3a297/Data/HashMap/Internal.hs#L797. However there are a few cases where this is missing, e.g. infinity0@ e1974fa https://github.com/infinity0/unordered-containers/commit/e1974fa22c416e4681d89e5ee5beed6b47fbda7d and it is unclear why these are not being forced - whether on purpose or by oversight.

Logically speaking, it seems that these arrays ought only to contain forced values and never thunks, even for HashMap.Lazy - is that correct?

If so, perhaps we should simply add bang annotations to the insert functions themselves in Data.HashMap.Internal.Array? For example in @.*** https://github.com/infinity0/unordered-containers/commit/e2756b124cc1bddb943666ae3ee6fa8e2cb1749e

  • this is done mechanically by grepping for all arguments of type a -> and adding a !, one does not have to think hard about whether it's correct, and is impossible to accidentally misuse.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/issues/311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7KLV4NIJ2ZONFIFY4TTISKYFANCNFSM4236A5KA .

infinity0 commented 3 years ago

OK I see, that makes sense. In that case infinity0@e2756b1 is not suitable for a lazy (or dual) HashMap, so I'll apply it only to the strict variant I'm working on.

Any comment on infinity0@4415cb6 ? Similar code in other places in the same file does contain strictness annotations, so it's possible they might indicate corner-case space leaks.

treeowl commented 3 years ago

I don't have time to look at that right now. Some places that initially look suspicious are okay because the things they install in constructors are already known to be in WHNF. But it would definitely be worth at least adding comments about those, if not forcing them (redundantly).

On Tue, Apr 13, 2021, 6:48 PM Ximin Luo @.***> wrote:

OK I see, that makes sense. In that case infinity0/unordered-containers@ e2756b1 https://github.com/infinity0/unordered-containers/commit/e2756b1 is not suitable for a lazy (or dual) HashMap, so I'll apply it only to the strict variant I'm working on.

Any comment on @.*** https://github.com/infinity0/unordered-containers/commit/4415cb6 ? Similar code in other places in the same file does contain strictness annotations, so it's possible they might indicate corner-case space leaks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/issues/311#issuecomment-819101160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7J2ANQ6XKO3LD3G3XLTITC43ANCNFSM4236A5KA .

sjakobi commented 3 years ago

I'm reluctant to force always for lazy HashMaps. Think about fmap.

OK I see, that makes sense.

For the sake of clarity, could one of you explain what the problem with fmap would be?

infinity0 commented 3 years ago

If you want to calculate something like filterOnKey predicate (fmap f m) then you might not want the whole fmap f m to be evaluated strictly on the values, since some of them will get filtered out. (This also applies to the inner parts of the data structure that might be lazy, such as the lazy arrays I brought up in the OP.)

treeowl commented 3 years ago

That's not really it. fmap can "operationally fuse" with the following operation, improving cache utilization and GC performative.

On Thu, Apr 15, 2021, 7:54 AM Ximin Luo @.***> wrote:

If you want to calculate something like filterOnKey predicate (fmap f m) then you might not want the whole fmap f m to be evaluated strictly on the values, since some of them will get filtered out.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/issues/311#issuecomment-820364593, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7IZV4HDYKHHI4MXXO3TI3HXRANCNFSM4236A5KA .

infinity0 commented 3 years ago

Can you give a concrete example? I thought my example would be an example of a case of fmap fusing. There is another example when implementing any and you want it to shortcut (stop iterating when any value is True), which relies on the subject being lazily evaluated.

treeowl commented 3 years ago

Yes, sorry. Your example is indeed such, but the laziness in the arrays doesn't avoid evaluation; it just postpones it until it's more efficient to perform.

On Thu, Apr 15, 2021, 5:58 PM Ximin Luo @.***> wrote:

Can you give a concrete example? I thought my example would be an example of a case of fmap fusing. There is another example when implementing any and you want it to shortcut (stop iterating when any value is True), which relies on the subject being lazily evaluated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/issues/311#issuecomment-820755774, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7ITRBRITLC4I5NHYS3TI5OP7ANCNFSM4236A5KA .