serokell / universum

:milky_way: Prelude written in @Serokell
MIT License
176 stars 28 forks source link

Add some 'groupBy' #64

Closed chshersh closed 6 years ago

chshersh commented 7 years ago

I really very often want to have function like this:

groupListBy :: (Eq b, Hashable b) => (a -> b) -> [a] -> HashMap b [a]
groupBy     :: (Eq b, Hashable b) => (a -> b) -> [a] -> HashMap b  a

These function don't even exist in any packages! I propose to add some reasonable grouping. If you have your ideas how grouping function should be look like, please, share. My version is the following next:

groupBy :: (Container f, Element f ~ a, Eq b, Hashable b) 
        => (a -> b) -> f -> HashMap b (NonEmpty a)
mapBy   :: (Container f, Element f ~ a, Eq b, Hashable b) 
        => (a -> b) -> f -> HashMap b a

UPDATE: better function signatures adapted to universum

gromakovsky commented 7 years ago

My personal intuition is that such functions shouldn't be in prelude, maybe in serokell-util.

chshersh commented 7 years ago

serokell-util probably shouldn't exist. So I don't want to add more functions to it.

Martoon-00 commented 7 years ago

What about renaming them a bit? Your groupListBy looks more essential and is more similar to the true group than your current groupBy.

Also, grouping which returns HashMap b a implicitly throws duplicating elements away, and I think that would be better stated explicitly via NE.head <$> groupListBy f l or by renaming to some groupByOne

chshersh commented 7 years ago

It's a good advice to name HashMap b [a] function as groupBy. But then we need to think how to name HashMap b a. Maybe groupSingleBy or groupOneBy or nubBy or hashifyBy or something else. Btw, NE.head <$> groupListBy f l is very inefficient implementation.

Martoon-00 commented 7 years ago

Btw, NE.head <$> groupListBy f l is very inefficient implementation.

Why very?

chshersh commented 7 years ago

@Martoon-00 Well, not exactly very. It's really hard to say anything w/o benchmarks. But implementing groupBy :: ... HashMap b [a] is only possible using (++) operator. This results in extra step for each value.

Martoon-00 commented 7 years ago

It seems to me that similar efficiency leaks are all over our code and no one concerns, but ok, perhaps library have to suggest performant stuff if it can. I believe that you'll stop on so cool name for that function that it will be much more pleasant to use than ... groupBy when the option is given :wink:

chshersh commented 7 years ago

Currently we don't have agreement which container to use: HashMap or just Map. So I postpone this issue to better time...

volhovm commented 6 years ago

These functions are nice to have somewhere. They must be fast, so there should be instances for both HashMap and Map and maybe list ((b, [a])). I suggest to have these in separate modules, maybe utility package, maybe in Map or HashMap themselves. It's also easy to write the optimal version for yourself if you need to, so I don't think the proposal is useful.

chshersh commented 6 years ago

Okay, it's not so common. I think we should start with having these functions in some utilities package and then just see how it's going and how often they used.