haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
314 stars 177 forks source link

Map.unionWith is over specialized and not consistent with intersectionWith #984

Open theobat opened 5 months ago

theobat commented 5 months ago

Hi, I think specializing the Map.unionWith operation to the case where both operands are of the same type Map k a is not a good design.

I think it should be :

unionWith :: (These a b -> c) -> Map k a -> Map k b -> Map k c 

Or, for a less opinionated (but also, worse) signature it could be

unionWith :: (Maybe a -> Maybe b -> c) -> Map k a -> Map k b -> Map k c 

Now I understand this would introduce a pretty significant breaking change which is bad, but there are various options for this, such as deprecating unionWith or renaming it to introduce the proper one ?

jwaldmann commented 5 months ago

Example for last item: run this as ghci script

import qualified Data.Map.Strict as M

a = M.fromList $ take (10^7) $ zip [   0 :: Int ..] $ repeat ()
b = M.fromList $ take (10^7) $ zip [   1 :: Int ..] $ repeat ()
c = M.fromList $ take (10^7) $ zip [10^7 :: Int ..] $ repeat ()

length $ M.elems a
length $ M.elems b
length $ M.elems c

:set +s

length $ M.elems $ M.union a b -- overlapping

length $ M.elems $ M.union a c -- disjoint

I am seeing

10000001 -- overlapping
(0.92 secs, 1,340,080,824 bytes)
20000000 -- disjoint
(0.28 secs, 1,120,118,728 bytes)
theobat commented 5 months ago

Please no. This would break a lot of (my) code.

Can you give a concrete example where you need this, and argue that you can't use https://hackage.haskell.org/package/containers-0.7/docs/Data-Map-Merge-Strict.html#v:merge

Besides, this is the whole point, I'm not comfortable with the fact that in order to have a values modifying union, you have to use 5 lines/words of a complex function instead of a single one. It's simply not a good design of the API for me.

Performance: with identical value types (in arguments and result, as it is for current unionWith), consider the case that keys of arguments are from disjoint intervals (e.g., all keys in left arg < all keys in right arg). Then implementation might not need to do a complete traversal. For your suggested "proper" type, it must?

I'm sympathetic to the performance argument, but this is also something that should not be traded for a very narrow API. And running unionWith on disjoint key sets as it stands today is a noop, because the "with" function will not be used. Use union instead. Unless of course you want to transform the values, in case you should use the correct API for unions which let you do just that.

jwaldmann commented 5 months ago

performance ... should not be traded for ...

Some programmers may prefer "the" most general API; some applications might require best possible performance.

unionWith on disjoint key sets as it stands today is a noop

no it's not, because the implementation does not know that inputs are disjoint. I wrote "disjoint intervals" because that's a case that I can imagine is easier to detect (log instead of linear). Even after that, it requires work, in re-balancing (but log work, not linear).

My concern is that a generic implementation needs to traverse everything (so, linear again). This can be worked around by reified transformation functions (e.g., preserveMissing) but not with the general type that you suggest?

theobat commented 5 months ago

Some programmers may prefer "the" most general API; some applications might require best possible performance.

Fine so you agree that one is not intrinsically above the other, so at the very least, both functions should exist. The fact that the "with" function only run on the intersection of maps is very narrow indeed and not documented. And it seems I'm not the only one puzzled by the current API. It's very awkward that the optimized narrow function is simple and exposed whereas the general and intuitive one is not. I say intuitive because one cannot say in good faith that a union with a combination function F only running on the intersection is intuitive, I mean it's simply not in the name !

My concern is that a generic implementation needs to traverse everything (so, linear again). This can be worked around by reified transformation functions (e.g., preserveMissing) but not with the general type that you suggest?

Then let's have two functions... ? What's the problem with that ? I expect people will say that it's redundant with the merge, but honestly everything in the API is redundant with the merge function at some point.

konsumlamm commented 5 months ago

I say intuitive because one cannot say in good faith that a union with a combination function F only running on the intersection is intuitive, I mean it's simply not in the name !

I find it very intuitive, what else would the combining function be used for?

theobat commented 5 months ago

@konsumlamm, ok so, if you had to guess without a type signature what the type signature would be you'd expect union with, to operate the combination function only on the intersection case ? Do you really expect me to believe that ? Of course once you have the type signature and you think about it for 3 seconds it becomes obivous that it can't be anything else which has nothing to do with it being intuitive...

jwaldmann commented 5 months ago

basic set "theory": in union a b, the two sets a, b might be disjoint, or not. I expect a combing function for union to handle the non-disjoint case, because - what else would it do?

theobat commented 5 months ago

I made it clear in the first post what it should handle : all the cases, which is not the case here (and this has nothing to do with disjoint or not by the way, excluding performance aspects).

konsumlamm commented 5 months ago

@konsumlamm, ok so, if you had to guess without a type signature what the type signature would be you'd expect union with, to operate the combination function only on the intersection case ? Do you really expect me to believe that ?

Yes, that's exactly what I would expect. I don't care if you believe me, why should I lie to you?

konsumlamm commented 5 months ago

For the record, I'm not saying that I find a function like you're proposing unreasonable. In fact, I think it would be a good addition, although the fact that there is no These type in base or containers (or any other core library) makes it less appealing. I just don't agree that the current unionWith is bad design or unintuitive.