Open infinity0 opened 3 years ago
I'm reluctant to force always for lazy HashMaps. Think about fmap.
On Tue, Apr 13, 2021, 3:22 PM Ximin Luo @.***> wrote:
In haskellari/strict#28 https://github.com/haskellari/strict/pull/28 (discussion in haskellari/strict#22 https://github.com/haskellari/strict/issues/22) I am experimenting with a safer version of strict HashMap, where instances are also strict, by adding ! to the data type definitions. So
data Leaf k v = L !k v
becomes
data Leaf k v = L !k !v
As the HashMap type is mostly already strictly-annotated:
data HashMap k v = Empty | BitmapIndexed !Bitmap !(A.Array (HashMap k v)) | Leaf !Hash !(Leaf k v) | Full !(A.Array (HashMap k v)) | Collision !Hash !(A.Array (Leaf k v))
together this gets us 98% of the way there towards having a fully-strict HashMap that is impossible to accidentally introduce thunks into.
However, one thing remains that could contain thunks, namely these A.Array containers, which is an alias to the primitive Array# or SmallArray#. As far as I can tell, these are lazy containers, and the HashMap implementation currently has to force things before adding them to these arrays, e.g. 9d3a297/Data/HashMap/Internal.hs#L797 https://github.com/haskell-unordered-containers/unordered-containers/blob/9d3a297/Data/HashMap/Internal.hs#L797. However there are a few cases where this is missing, e.g. infinity0@ e1974fa https://github.com/infinity0/unordered-containers/commit/e1974fa22c416e4681d89e5ee5beed6b47fbda7d and it is unclear why these are not being forced - whether on purpose or by oversight.
Logically speaking, it seems that these arrays ought only to contain forced values and never thunks, even for HashMap.Lazy - is that correct?
If so, perhaps we should simply add bang annotations to the insert functions themselves in Data.HashMap.Internal.Array? For example in @.*** https://github.com/infinity0/unordered-containers/commit/e2756b124cc1bddb943666ae3ee6fa8e2cb1749e
- this is done mechanically by grepping for all arguments of type a -> and adding a !, one does not have to think hard about whether it's correct, and is impossible to accidentally misuse.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/issues/311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7KLV4NIJ2ZONFIFY4TTISKYFANCNFSM4236A5KA .
OK I see, that makes sense. In that case infinity0@e2756b1 is not suitable for a lazy (or dual) HashMap, so I'll apply it only to the strict variant I'm working on.
Any comment on infinity0@4415cb6 ? Similar code in other places in the same file does contain strictness annotations, so it's possible they might indicate corner-case space leaks.
I don't have time to look at that right now. Some places that initially look suspicious are okay because the things they install in constructors are already known to be in WHNF. But it would definitely be worth at least adding comments about those, if not forcing them (redundantly).
On Tue, Apr 13, 2021, 6:48 PM Ximin Luo @.***> wrote:
OK I see, that makes sense. In that case infinity0/unordered-containers@ e2756b1 https://github.com/infinity0/unordered-containers/commit/e2756b1 is not suitable for a lazy (or dual) HashMap, so I'll apply it only to the strict variant I'm working on.
Any comment on @.*** https://github.com/infinity0/unordered-containers/commit/4415cb6 ? Similar code in other places in the same file does contain strictness annotations, so it's possible they might indicate corner-case space leaks.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/issues/311#issuecomment-819101160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7J2ANQ6XKO3LD3G3XLTITC43ANCNFSM4236A5KA .
I'm reluctant to force always for lazy HashMaps. Think about fmap.
OK I see, that makes sense.
For the sake of clarity, could one of you explain what the problem with fmap
would be?
If you want to calculate something like filterOnKey predicate (fmap f m)
then you might not want the whole fmap f m
to be evaluated strictly on the values, since some of them will get filtered out. (This also applies to the inner parts of the data structure that might be lazy, such as the lazy arrays I brought up in the OP.)
That's not really it. fmap can "operationally fuse" with the following operation, improving cache utilization and GC performative.
On Thu, Apr 15, 2021, 7:54 AM Ximin Luo @.***> wrote:
If you want to calculate something like filterOnKey predicate (fmap f m) then you might not want the whole fmap f m to be evaluated strictly on the values, since some of them will get filtered out.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/issues/311#issuecomment-820364593, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7IZV4HDYKHHI4MXXO3TI3HXRANCNFSM4236A5KA .
Can you give a concrete example? I thought my example would be an example of a case of fmap fusing. There is another example when implementing any
and you want it to shortcut (stop iterating when any value is True
), which relies on the subject being lazily evaluated.
Yes, sorry. Your example is indeed such, but the laziness in the arrays doesn't avoid evaluation; it just postpones it until it's more efficient to perform.
On Thu, Apr 15, 2021, 5:58 PM Ximin Luo @.***> wrote:
Can you give a concrete example? I thought my example would be an example of a case of fmap fusing. There is another example when implementing any and you want it to shortcut (stop iterating when any value is True), which relies on the subject being lazily evaluated.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/issues/311#issuecomment-820755774, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7ITRBRITLC4I5NHYS3TI5OP7ANCNFSM4236A5KA .
In haskellari/strict#28 (discussion in haskellari/strict#22) I am experimenting with a safer version of strict HashMap, where instances are also strict, by adding ! to the data type definitions. So
becomes
As the
HashMap
type is mostly already strictly-annotated:together this gets us 98% of the way there towards having a fully-strict HashMap that is impossible to accidentally introduce thunks into.
However, one thing remains that could contain thunks, namely these
A.Array
containers, which is an alias to the primitiveArray#
orSmallArray#
. As far as I can tell, these are lazy containers, and theHashMap
implementation currently has to force things before adding them to these arrays, e.g. 9d3a297/Data/HashMap/Internal.hs#L797. However there are a few cases where this is missing, e.g. https://github.com/infinity0/unordered-containers/commit/4415cb6 and it is unclear why these are not being forced - whether on purpose or by oversight.Logically speaking, it seems that these arrays ought only to contain forced values and never thunks, even for
HashMap.Lazy
- is that correct?If so, perhaps we should simply add bang annotations to the insert functions themselves in
Data.HashMap.Internal.Array
? For example in https://github.com/infinity0/unordered-containers/commit/e2756b124cc1bddb943666ae3ee6fa8e2cb1749e - this is done mechanically by grepping for all arguments of typea ->
and adding a!
, one does not have to think hard about whether it's correct, and is impossible to accidentally misuse.