snoyberg / classy-prelude

A typeclass-based Prelude.
108 stars 15 forks source link

groupWith sorts unlike groupBy #53

Closed gregwebs closed 11 years ago

gregwebs commented 11 years ago

the classy groupWith uses groupBy, but GHC.Exts.groupWith is different in that it actually sorts the list. I really don't like how groupBy in the standard API does not sort. I would suggest just removing the classy groupWith for now. You could keep it by adding a classy sortWith and warning that the implementation is probably inefficient.

> GHC.Exts.groupWith id [1,2,3,1,2,3,4 :: Int]
[[1,1],[2,2],[3,3],[4]]
> Import.groupWith  id [1,2,3,1,2,3,4 :: Int]
[[1],[2],[3],[1],[2],[3],[4]]
snoyberg commented 11 years ago

This was added by @nikita-volkov in https://github.com/snoyberg/classy-prelude/commit/40b2f3d10c01f91df4b97c3d334cf526c3481f67. I don't have a strong opinion on this function, as I've personally never used it.

nikita-volkov commented 11 years ago

What's wrong about a grouping function not sorting the data structure? After all, neither the Data.List.groupBy nor GHC.Exts.groupWith provide any particular guarantees concerning the order of the data produced. Besides, unlike with GHC.Exts.groupWith, we currently get this function working on much more data structures than just List for free: Vector, Text, ByteString.

Concerning the suggestion to remove it, what for? I mean, what's the benefit?

@snoyberg We could easily solve the current issue for lists particularly by migrating the function into a typeclass and aliasing the List's implementation to GHC.Exts if only we were willing to introduce such a dependency. It would also introduce an inconsistency with how this function behaves on other types, so I don't think I support the move.

gregwebs commented 11 years ago

@nikita-volkov the benefit of removing it is that others won't wast time debugging why this didn't work as it was documented. Even if the documentation is changed, the expectation will still be the same behavior as GHC.Exts, no matter what the type is. So if you want its functionality to remain it needs to be renamed to something else. I personally find it shocking that groupBy does not sort and would suggest the name neighbor instead of group, so neighborWith.

nikita-volkov commented 11 years ago

Okay, IMO you've made a good enough point to update the documentation specifying the difference from GHC.Exts. However I totally don't understand your shock with a function not doing what it isn't supposed to do, i.e. "group" is supposed to group, not sort. With that said to me it makes much more sense to name the function you expect groupAndSortWith.

My vote is to leave the implementations as they are and update the docs. Leaving it in @snoyberg's court now.

maxcan commented 11 years ago

I think the issue is that the name is identical to an existing function (albeit in GHC.Exts) so not behaving like the existing function is a bit misleading. At a bare minimum a warning in the documentation is called for. As for changing the function or adding a groupAndSortWith I'm a bit indifferent. I could see the current groupWith possibly being more performant so maybe we should leave it in.

Max

On Tue, Jul 9, 2013 at 9:43 AM, Nikita Volkov notifications@github.comwrote:

Okay, IMO you've made a good enough point to update the documentation specifying the difference from GHC.Exts. However I totally don't understand your shock with a function not doing what it isn't supposed to do, i.e. "group" is supposed to group, not sort. With that said to me it makes much more sense to name the function you expect groupAndSortWith.

My vote is to leave the implementations as they are and update the docs. Leaving it in @snoyberg https://github.com/snoyberg's court now.

— Reply to this email directly or view it on GitHubhttps://github.com/snoyberg/classy-prelude/issues/53#issuecomment-20687806 .

gregwebs commented 11 years ago

@nikita-volkov the shock comes from using SQL GROUP BY. Its not that grouping performs a sort (which is an implementation detail that may or may not be true), its that it operates on the entire set of data.

But I get that Haskell has chosen to define groupBy the way it has already and that GHC.Exts is the odd man out. But still it is best to just avoid groupWith and change it to something like groupOn Hayoo found me an alternate prelude that does just that (and also defines groupSortOn).

http://hackage.haskell.org/packages/archive/tamarin-prover-utils/latest/doc/html/src/Extension-Prelude.html#groupOn

snoyberg commented 11 years ago

Again, I'm not vested in this, so I may not have great insight.

How about adding two new functions, one with the current groupWith behavior (maybe called groupWithPresorted), and one with @gregwebs's desired behavior, and then add a deprecation message to groupWith pointing at the two new functions?

nikita-volkov commented 11 years ago

OMG. I'm so ashamed. I've only just realized what Greg's been talking about all this time. The problem is this:

Prelude Data.List> groupBy (\a b -> length a == length b) ["a", "bc", "d"]
[["a"],["bc"],["d"]]
Prelude Data.List> groupBy (\a b -> length a == length b) ["a", "d", "bc"]
[["a","d"],["bc"]]

I never realized that groupBy, on which the current implementation of groupWith is based, behaved this way. So I am pretty much shocked as well.

Now I believe the current issue should be considered as a bug report and that groupWith should be reimplemented to imply grouping on all items, not just sequential ones - the way it behaves in GHC.Exts. Appropriate docs should be added to both functions. I'll get on to fixing this, if we reach an agreement.

maxcan commented 11 years ago

I think we are all in agreement.

On Wed, Jul 10, 2013 at 3:27 AM, Nikita Volkov notifications@github.comwrote:

OMG. I'm so ashamed. I've only just realized what Greg's been talking about all this time. The problem is this:

Prelude Data.List> groupBy (\a b -> length a == length b) ["a", "bc", "d"][["a"],["bc"],["d"]]Prelude Data.List> groupBy (\a b -> length a == length b) ["a", "d", "bc"][["a","d"],["bc"]]

I never realized that groupBy, on which the current implementation of groupWith is based, behaved this way. So I am pretty much shocked as well.

Now I believe the current issue should be considered as a bug report and that groupWith should be reimplemented to imply grouping on all items, not just sequential ones - the way it behaves in GHC.Exts. Appropriate docs should be added to both functions. I'll get on to fixing this, if we reach an agreement.

— Reply to this email directly or view it on GitHubhttps://github.com/snoyberg/classy-prelude/issues/53#issuecomment-20733611 .