snoyberg / mono-traversable

Type classes for mapping, folding, and traversing monomorphic containers
153 stars 63 forks source link

Performance with Data.Sequences.groupAllOn #97

Open lippling opened 8 years ago

lippling commented 8 years ago

I realized that the performance of Data.Sequences.groupAllOn is much worse in comparison to Data.List.groupBy (I have a list with 29.000 items). groupBy on that list (sorted) finishes fast. I didn't measure it, but it is unnoticeable. groupAllOn (not sorted) takes 35 seconds.

I think that should be mentioned in the docs until someone improves the performance.

gregwebs commented 8 years ago

Why would you compare performance of sorted and unsorted?

current documentation

 Similar to standard 'groupBy', but operates on the whole collection,
 not just the consecutive items.

Isn't it implied that the performance of operating on the whole collection is worse than operating on consecutive items? Do you want to add a big O notation?

lippling commented 8 years ago

Of course it is unfair to compare sorted and unsorted, but I compared it because my use case was:

(data from db -> groupAllOn -> ...) => 35 seconds vs. (data from db with order by -> groupBy -> ...) => unnoticable

This was a bit unexpected, at least for me. Maybe adding a big O notation would help.

jchia commented 7 years ago

Haskell is non-strict, so I think the meaning of big-O notation on execution time here is slippery, although big-O notation in the number of comparisons may still be meaningful, but I don't see many functions being documented with big-O notation in Haskell and there are probably good reasons.

Considering that in groupAllOn members of a group can be anywhere in the SemiSequence and non-consecutive, groupAllOn can be expected to be slower than groupBy .

Considering that the result of the group function only needs to be in Eq (according to the type constraint Eq b), not Hashable or Ord, I wouldn't expect groupAllOn to use a fast algorithm because fast algorithms typically require hashing or comparison of the keys. In fact, if I only have Eq, the only algorithm I can think of involves comparing every element's key with every unique key value that has appeared, which requires O(n^2) comparisons in the worst case.