Open treeowl opened 5 years ago
This addition would help us address the feature requests #240 and #242.
I guess implementing merge
and testing it against the Data.Map
version would be a good first step!?
This should be the right type, no?
merge :: (Hashable k, Eq k)
=> SimpleWhenMissing k a c -- ^ What to do with keys in @m1@ but not @m2@
-> SimpleWhenMissing k b c -- ^ What to do with keys in @m2@ but not @m1@
-> SimpleWhenMatched k a b c -- ^ What to do with keys in both @m1@ and @m2@
-> HashMap k a
-> HashMap k b
-> HashMap k c
Could the merge strategies ([Simple]WhenMissing
etc) simply be copied from http://hackage.haskell.org/package/containers-0.6.2.1/docs/Data-Map-Merge-Lazy.html?
That's the general idea. You'll probably want to look at the Data.IntMap
version too. The hard part is figuring out how mergeA
should work. The rest is much easier.
@BebeSparkelSparkel would you possibly be interested in working on this?
I am concerned about the the performance of (see below) in my use case #240 . It looks like it would create a lot of maps during the fold, but maybe I'm wrong about that.foldr (merge f0 f1 f2) mempty
@BebeSparkelSparkel Note that unions
is also simply defined as foldl' union empty
.
Do you have a more efficient implementation in mind?
I doubt a much more generally efficient implementation of unions
is possible. It'll either be foldl' union empty
or foldr' union empty
, depending on the Foldable
.
I was mistaken. foldr (merge f0 f1 f2) mempty
is incorrect. What I actually need is more like fmap f0 . foldr (merge f1 f2 f3) mempty
if using merge
.
Foldable f => (NonEmpty a -> b) -> f (HashMap k a) -> HashMap k b
has the merge and map while only creating the resulting HashMap.It seems to me like #226 and #240 are not addressing the same problem.
I think I'll give merge
a spin myself. It looks like a nice exercise for me to better understand the internals.
@sjakobi, any progress on this? I could help. In Sturdy we need a mergeA
function to implement efficient widening operators.
@svenkeidel I have started working on merge
in https://github.com/haskell-unordered-containers/unordered-containers/compare/sjakobi/226-merge, but realized that it's a bit much for a first project for someone who isn't really familiar with the internals.
I'd be very happy if you would take this over and implement mergeA
and/or merge
!
I'm curious about your usecase though! Does the order in which we sequence effects matter? I wasn't sure whether we could do anything better than keeping it completely arbitrary and unspecified.
I'm curious about your usecase though! Does the order in which we sequence effects matter? I wasn't sure whether we could do anything better than keeping it completely arbitrary and unspecified.
I need to combine two maps, while simultaneously finding out if a key in one of the maps was missing or one of the values in the resulting map has changed (https://github.com/svenkeidel/sturdy/blob/master/lib/src/Data/Abstract/Map.hs#L57-L67). mergeA
is the perfect fit for that. The order of effects does not matter.
More generally, I think the order of map entries should not matter since hashmaps are inherently unordered, hence the name of the library. If users rely on a predictable order of entries within the map, they have to use a different datastructure.
@oberblastmeister, how would you like to tackle this challenge while merging mechanics are still fresh in your mind?
@treeowl yes I would like to do this, but maybe after I do difference
?
If you think that'll help you prepare! Another option would be to use mergeA
to implement difference
.
Yeah I think I'll use mergeA
to implement difference
. Though probably merge
, because applicatives are bad on arrays right?
I'd feel more comfortable about adding functions like mergeA
once we have tests for the internal invariants (https://github.com/haskell-unordered-containers/unordered-containers/issues/366).
Are we planning to make merge
public, or only use it to implement functions like intersection
, differrence
, etc. Looking at containers
merge
seems to be internal.
merge
should be public, as it is in containers
. See e.g. https://hackage.haskell.org/package/containers-0.6.5.1/docs/Data-Map-Merge-Strict.html#v:merge.
Woops, I must have clicked the wrong one
I'd feel more comfortable about adding functions like
mergeA
once we have tests for the internal invariants (#366).
This is resolved BTW: basic tooling for invariant checking was added in #444. More tests are being added in #455.
Data.Map
andData.IntMap
offermerge
andmergeA
operations that can take unions, intersections, differences, etc. We should be able to do something quite similar here. I suspect optimalmerge
will have to be a bit different frommergeA
rather than just a specialization, thanks to the way arrays work. Also, as I've explored in a slightly different context,mergeA
can probably be implemented particularly efficiently for certain transformer stacks on top ofIO
orST s
, again thanks to the mutable array business. Dunno if that sort of thing is worth bothering with.