Open theodorzoulias opened 2 years ago
@theodorzoulias - Nice analysis.
Two things: 1. In the future, when doing benchmarks, recommend that you use BenchmarkDotNet; 2. If you want to write a PR for this on my fork SuperLinq, I'll gladly take it in. If not, I'll rely on this analysis to update the code myself shortly.
@viceroypenguin thanks for the quick response. I know about the BenchmarkDotNet, and I've used it, once. I realized that I don't have the patience to wait for half an hour for a simple benchmark, nor I like the idea of my PC getting tortured while I am rolling my thumbs, so I gave up with it. I am not aiming to precise measurements anyway. Getting a rough idea of how something performs is usually enough for me. Which obviously makes me not the right person for writing PRs for this high-profile project. 😃
No worries! I'll write the changes myself then. Thanks again for the analysis! :)
@theodorzoulias - For the record, SortedSet<T>
does not accurately implement .PartialSort()
. .PartialSort()
will work with the following call: new[]{1, 1, 2, 2, 3, 3, 4, }.PartialSort(3)
and return [1, 1, 2]
. However, SortedSet<T>
, working with a set (duplicate entries are ignored), will return [1, 2, 3]
. I'll write some more benchmarks to figure out if I should just switch to the LINQ methods or if there is a better way.
After testing with BDN, it looks like the original version in MoreLINQ, and the modified in SuperLinq (to have a stable sort) are generally the fastest versions. Please see here for the benchmark results and the code used. Once N gets up to about 1000, then the LINQ version does get faster.
@viceroypenguin ohh, the SortedSet<T>
doesn't accept duplicate values. That was dumb. I think that it can be fixed by attaching an incremented long
to each T
element, to make it unique. The implementation below should replicate correctly the existing behavior of the PartialSort
operator.
public static IEnumerable<T> PartialSort<T>(
this IEnumerable<T> source,
int count,
IComparer<T> comparer = default,
OrderByDirection direction = OrderByDirection.Ascending)
{
ArgumentNullException.ThrowIfNull(source);
if (count < 1) throw new ArgumentOutOfRangeException(nameof(count));
comparer ??= Comparer<T>.Default;
if (direction == OrderByDirection.Descending)
{
var ascendingComparer = comparer;
comparer = Comparer<T>.Create((x, y) => ascendingComparer.Compare(y, x));
}
SortedSet<(T Item, long Index)> bottom = new(Comparer<(T Item, long Index)>.Create((x, y) =>
{
int result = comparer.Compare(x.Item, y.Item);
if (result == 0) result = Comparer<long>.Default.Compare(x.Index, y.Index);
return result;
}));
long index = 0;
foreach (var item in source)
{
index++;
if (bottom.Count < count) { bottom.Add((item, index)); continue; }
if (comparer.Compare(item, bottom.Max.Item) > 0) continue;
bool removed = bottom.Remove(bottom.Max);
bool added = bottom.Add((item, index));
Debug.Assert(removed && added);
}
foreach (var entry in bottom) yield return entry.Item;
}
@viceroypenguin in order to magnify the effect of doing more comparisons than necessary, you could use long string
values as keys in the benchmarks. Strings don't have a trivial comparison cost like integers do. Especially if you throw a StringComparer.InvariantCultureIgnoreCase
in the mix!
Good call checking with a longer key comparison. Please check the gist now with updated code and results. You might specifically review the changes I made to SortedSet()
method compared to yours, since your version does not handle the keySelector
version of PartialSort()
. It appears with a longer comparison time, your SortedSet<T>
version is quicker than all of them by far. I'll finish making modifications and import to SuperLinq soon.
Fixed in Superlinq (viceroypenguin/SuperLinq@32f0241)
@viceroypenguin Nice! Btw I don't know if you are allowed to use .NET 6 components in the library. If you do, the PriorityQueue<TElement, TPriority>
might be preferable to the SortedSet<T>
. Internally it's an array-backed heap, so it's less allocatey than the binary tree-based SortedSet<T>
. This difference comes into play especially when the count
is large. I came up with this implementation:
public static IEnumerable<T> PartialSort<T>(
this IEnumerable<T> source,
int count,
IComparer<T> comparer = default,
OrderByDirection direction = OrderByDirection.Ascending)
{
ArgumentNullException.ThrowIfNull(source);
if (count < 1) throw new ArgumentOutOfRangeException(nameof(count));
comparer ??= Comparer<T>.Default;
if (direction == OrderByDirection.Ascending)
{
var originalComparer = comparer;
comparer = Comparer<T>.Create((x, y) => originalComparer.Compare(y, x));
}
PriorityQueue<bool, (T Item, long Index)> top = new(Comparer<(T Item, long Index)>.Create((x, y) =>
{
int result = comparer.Compare(x.Item, y.Item);
if (result == 0) result = Comparer<long>.Default.Compare(y.Index, x.Index);
return result;
}));
long index = 0;
foreach (var item in source)
{
if (top.Count < count)
top.Enqueue(default, (item, index++));
else
top.EnqueueDequeue(default, (item, index++));
}
List<T> topList = new(top.Count);
while (top.TryDequeue(out _, out var entry)) topList.Add(entry.Item);
for (int i = topList.Count - 1; i >= 0; i--) yield return topList[i];
}
It's not perfect because the PriorityQueue<TElement, TPriority>
carries a TElement
along with the TPriority
, that I didn't need in this implementation, so I put there a dummy bool
. Also it doesn't offer a way to enumerate it in reverse order, so finally I had to extract its contents into a List<T>
before yielding. But overall it should be slightly better than a SortedSet<T>
.
The PriorityQueue<TElement, TPriority>
allows elements with the same priority, so attaching a long
is only needed if you want to make the sort stable. Otherwise you could just use a PriorityQueue<bool, T> top = new(comparer)
, and make it both simpler and faster.
Surprisingly, I found that PriorityQueue<,>
is slower than the SortedSet<>
implementation at all sizes, though within the margin of error. I think having to reverse the elements is likely the cause. Regardless, I'm going to leave the current implementation as is for now. Thanks for the update! :)
@theodorzoulias FYI: SuperLinq 4.0.0 has been fully released with this improvement.
My first benchmark:
public class PartialSortVsNative
{
private const int N = 1_000_000;
private readonly List<int> data = new(N);
public PartialSortVsNative()
{
var random = new Random();
for (int i = 0; i < N; i++)
data.Add(random.Next());
}
[Benchmark(Baseline = true)]
public List<int> Native() => data.OrderBy(x => x).Take(5).ToList();
[Benchmark]
public List<int> MoreLinqPartialSort() => MoreLinq.MoreEnumerable.PartialSort(data, 5).ToList();
[Benchmark]
public List<int> SuperLinqPartialSort() => SuperLinq.SuperEnumerable.PartialSort(data, 5).ToList();
}
results in
Method | Mean | Error | StdDev | Ratio | RatioSD |
---|---|---|---|---|---|
Native | 26.52 ms | 0.524 ms | 1.009 ms | 1.00 | 0.00 |
MoreLinqPartialSort | 17.13 ms | 0.341 ms | 0.379 ms | 0.64 | 0.04 |
SuperLinqPartialSort | 19.94 ms | 0.396 ms | 0.472 ms | 0.75 | 0.04 |
With added randomness for the take count:
public class PartialSortVsNative
{
private const int N = 1_000_000;
private readonly List<int> data = new(N);
private readonly int TakeCount;
public PartialSortVsNative()
{
var random = new Random();
for (int i = 0; i < N; i++)
{
data.Add(random.Next());
}
TakeCount = random.Next(N);
}
[Benchmark(Baseline = true)]
public List<int> Native() => data.OrderBy(x => x).Take(TakeCount).ToList();
[Benchmark]
public List<int> MoreLinqPartialSort() => MoreLinq.MoreEnumerable.PartialSort(data, TakeCount).ToList();
[Benchmark]
public List<int> SuperLinqPartialSort() => SuperLinq.SuperEnumerable.PartialSort(data, TakeCount).ToList();
}
// Summary
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19042.2006/20H2/October2020Update) AMD Ryzen 5 PRO 4650U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores .NET SDK=6.0.400 [Host] : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2 DefaultJob : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
Method Mean Error StdDev Ratio RatioSD Native 177.9 ms 4.91 ms 14.48 ms 1.00 0.00 MoreLinqPartialSort 839.3 ms 15.65 ms 15.37 ms 4.33 0.37 SuperLinqPartialSort 1,223.1 ms 24.08 ms 53.87 ms 6.94 0.59
Couple things:
[GlobalSetup] public void Setup()
method. That said, I'm not sure if there is a practical difference between the two other than explicitness.TakeCount
random is not an apples-apples comparison, because BenchmarkDotNet will run the setup once for each method, and then run the method thousands of times to generate statistics. This means that TakeCount
could be 5 for Native and 50_000 for SuperLinqPartialSort
, and vice versa on the next benchmark.Random
so that the same randomly generated values are used for each version of the test. It may be that one run is closer to sorted than the next run, which would affect variances between the different tests.Updating your code accordingly, I get the following results:
// Summary
BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22622.601) Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2
Job=.NET 6.0 Runtime=.NET 6.0
Method Mean Error StdDev Ratio RatioSD Native 22.95 ms 0.377 ms 0.504 ms 1.00 0.00 MoreLinqPartialSort 16.86 ms 0.332 ms 0.310 ms 0.74 0.03 SuperLinqPartialSort 16.82 ms 0.330 ms 0.308 ms 0.74 0.02
PS: You can also look at my old benchmarks here
@MisinformedDNA I did some more research, and found a place where we were unnecessarily adding performance delays. Please see viceroypenguin/SuperLinq#67 for the fix, which is currently being released with SuperLinq v4.4.0. SuperLinq should now be as fast as MoreLinq in the int case, and faster in the string case.
My first benchmark:
@MisinformedDNA my expectation is that MoreLinq's List<T> top
-based implementation should be quite performant when the int count
is small (less than 50 or so), and it should become progressively slower when the int count
increases. That's because it calls the List<T>.Insert
method, which is an O(n) operation. On the contrary the SortedSet<T>.Add
(in viceroypenguin's SortedSet<T> top
-based implementation) is O(log n).
Hi! I ran some tests with the
PartialSort
operator, and it seems that it uses thecomparer
more frequently than necessary. For most items it should be enough to check only if the item is smaller that the smallest of the top items, but instead it performs a binary-search operation invariably for each item.I have posted my findings on StackOverflow, in this answer. It seems that the implementation of the
PartialSort
could be optimized.