Open lippling opened 8 years ago
Why would you compare performance of sorted and unsorted?
current documentation
Similar to standard 'groupBy', but operates on the whole collection,
not just the consecutive items.
Isn't it implied that the performance of operating on the whole collection is worse than operating on consecutive items? Do you want to add a big O notation?
Of course it is unfair to compare sorted and unsorted, but I compared it because my use case was:
(data from db -> groupAllOn -> ...) => 35 seconds vs. (data from db with order by -> groupBy -> ...) => unnoticable
This was a bit unexpected, at least for me. Maybe adding a big O notation would help.
Haskell is non-strict, so I think the meaning of big-O notation on execution time here is slippery, although big-O notation in the number of comparisons may still be meaningful, but I don't see many functions being documented with big-O notation in Haskell and there are probably good reasons.
Considering that in groupAllOn
members of a group can be anywhere in the SemiSequence
and non-consecutive, groupAllOn
can be expected to be slower than groupBy
.
Considering that the result of the group function only needs to be in Eq
(according to the type constraint Eq b
), not Hashable
or Ord
, I wouldn't expect groupAllOn
to use a fast algorithm because fast algorithms typically require hashing or comparison of the keys. In fact, if I only have Eq
, the only algorithm I can think of involves comparing every element's key with every unique key value that has appeared, which requires O(n^2) comparisons in the worst case.
I realized that the performance of
Data.Sequences.groupAllOn
is much worse in comparison toData.List.groupBy
(I have a list with 29.000 items).groupBy
on that list (sorted) finishes fast. I didn't measure it, but it is unnoticeable.groupAllOn
(not sorted) takes 35 seconds.I think that should be mentioned in the docs until someone improves the performance.