morelinq / MoreLINQ

Extensions to LINQ to Objects
https://morelinq.github.io/
Apache License 2.0
3.69k stars 413 forks source link

The PartialSort operator can be improved #840

Open theodorzoulias opened 2 years ago

theodorzoulias commented 2 years ago

Hi! I ran some tests with the PartialSort operator, and it seems that it uses the comparer more frequently than necessary. For most items it should be enough to check only if the item is smaller that the smallest of the top items, but instead it performs a binary-search operation invariably for each item.

I have posted my findings on StackOverflow, in this answer. It seems that the implementation of the PartialSort could be optimized.

viceroypenguin commented 2 years ago

@theodorzoulias - Nice analysis.

Two things: 1. In the future, when doing benchmarks, recommend that you use BenchmarkDotNet; 2. If you want to write a PR for this on my fork SuperLinq, I'll gladly take it in. If not, I'll rely on this analysis to update the code myself shortly.

theodorzoulias commented 2 years ago

@viceroypenguin thanks for the quick response. I know about the BenchmarkDotNet, and I've used it, once. I realized that I don't have the patience to wait for half an hour for a simple benchmark, nor I like the idea of my PC getting tortured while I am rolling my thumbs, so I gave up with it. I am not aiming to precise measurements anyway. Getting a rough idea of how something performs is usually enough for me. Which obviously makes me not the right person for writing PRs for this high-profile project. 😃

viceroypenguin commented 2 years ago

No worries! I'll write the changes myself then. Thanks again for the analysis! :)

viceroypenguin commented 2 years ago

@theodorzoulias - For the record, SortedSet<T> does not accurately implement .PartialSort(). .PartialSort() will work with the following call: new[]{1, 1, 2, 2, 3, 3, 4, }.PartialSort(3) and return [1, 1, 2]. However, SortedSet<T>, working with a set (duplicate entries are ignored), will return [1, 2, 3]. I'll write some more benchmarks to figure out if I should just switch to the LINQ methods or if there is a better way.

viceroypenguin commented 2 years ago

After testing with BDN, it looks like the original version in MoreLINQ, and the modified in SuperLinq (to have a stable sort) are generally the fastest versions. Please see here for the benchmark results and the code used. Once N gets up to about 1000, then the LINQ version does get faster.

theodorzoulias commented 2 years ago

@viceroypenguin ohh, the SortedSet<T> doesn't accept duplicate values. That was dumb. I think that it can be fixed by attaching an incremented long to each T element, to make it unique. The implementation below should replicate correctly the existing behavior of the PartialSort operator.

public static IEnumerable<T> PartialSort<T>(
    this IEnumerable<T> source,
    int count,
    IComparer<T> comparer = default,
    OrderByDirection direction = OrderByDirection.Ascending)
{
    ArgumentNullException.ThrowIfNull(source);
    if (count < 1) throw new ArgumentOutOfRangeException(nameof(count));
    comparer ??= Comparer<T>.Default;
    if (direction == OrderByDirection.Descending)
    {
        var ascendingComparer = comparer;
        comparer = Comparer<T>.Create((x, y) => ascendingComparer.Compare(y, x));
    }
    SortedSet<(T Item, long Index)> bottom = new(Comparer<(T Item, long Index)>.Create((x, y) =>
    {
        int result = comparer.Compare(x.Item, y.Item);
        if (result == 0) result = Comparer<long>.Default.Compare(x.Index, y.Index);
        return result;
    }));
    long index = 0;
    foreach (var item in source)
    {
        index++;
        if (bottom.Count < count) { bottom.Add((item, index)); continue; }
        if (comparer.Compare(item, bottom.Max.Item) > 0) continue;
        bool removed = bottom.Remove(bottom.Max);
        bool added = bottom.Add((item, index));
        Debug.Assert(removed && added);
    }
    foreach (var entry in bottom) yield return entry.Item;
}
theodorzoulias commented 2 years ago

@viceroypenguin in order to magnify the effect of doing more comparisons than necessary, you could use long string values as keys in the benchmarks. Strings don't have a trivial comparison cost like integers do. Especially if you throw a StringComparer.InvariantCultureIgnoreCase in the mix!

viceroypenguin commented 2 years ago

Good call checking with a longer key comparison. Please check the gist now with updated code and results. You might specifically review the changes I made to SortedSet() method compared to yours, since your version does not handle the keySelector version of PartialSort(). It appears with a longer comparison time, your SortedSet<T> version is quicker than all of them by far. I'll finish making modifications and import to SuperLinq soon.

viceroypenguin commented 2 years ago

Fixed in Superlinq (viceroypenguin/SuperLinq@32f0241)

theodorzoulias commented 2 years ago

@viceroypenguin Nice! Btw I don't know if you are allowed to use .NET 6 components in the library. If you do, the PriorityQueue<TElement, TPriority> might be preferable to the SortedSet<T>. Internally it's an array-backed heap, so it's less allocatey than the binary tree-based SortedSet<T>. This difference comes into play especially when the count is large. I came up with this implementation:

public static IEnumerable<T> PartialSort<T>(
    this IEnumerable<T> source,
    int count,
    IComparer<T> comparer = default,
    OrderByDirection direction = OrderByDirection.Ascending)
{
    ArgumentNullException.ThrowIfNull(source);
    if (count < 1) throw new ArgumentOutOfRangeException(nameof(count));
    comparer ??= Comparer<T>.Default;
    if (direction == OrderByDirection.Ascending)
    {
        var originalComparer = comparer;
        comparer = Comparer<T>.Create((x, y) => originalComparer.Compare(y, x));
    }
    PriorityQueue<bool, (T Item, long Index)> top = new(Comparer<(T Item, long Index)>.Create((x, y) =>
    {
        int result = comparer.Compare(x.Item, y.Item);
        if (result == 0) result = Comparer<long>.Default.Compare(y.Index, x.Index);
        return result;
    }));
    long index = 0;
    foreach (var item in source)
    {
        if (top.Count < count)
            top.Enqueue(default, (item, index++));
        else
            top.EnqueueDequeue(default, (item, index++));
    }
    List<T> topList = new(top.Count);
    while (top.TryDequeue(out _, out var entry)) topList.Add(entry.Item);
    for (int i = topList.Count - 1; i >= 0; i--) yield return topList[i];
}

It's not perfect because the PriorityQueue<TElement, TPriority> carries a TElement along with the TPriority, that I didn't need in this implementation, so I put there a dummy bool. Also it doesn't offer a way to enumerate it in reverse order, so finally I had to extract its contents into a List<T> before yielding. But overall it should be slightly better than a SortedSet<T>.

The PriorityQueue<TElement, TPriority> allows elements with the same priority, so attaching a long is only needed if you want to make the sort stable. Otherwise you could just use a PriorityQueue<bool, T> top = new(comparer), and make it both simpler and faster.

viceroypenguin commented 2 years ago

Surprisingly, I found that PriorityQueue<,> is slower than the SortedSet<> implementation at all sizes, though within the margin of error. I think having to reverse the elements is likely the cause. Regardless, I'm going to leave the current implementation as is for now. Thanks for the update! :)

viceroypenguin commented 2 years ago

@theodorzoulias FYI: SuperLinq 4.0.0 has been fully released with this improvement.

MisinformedDNA commented 2 years ago

My first benchmark:

public class PartialSortVsNative
{
    private const int N = 1_000_000;
    private readonly List<int> data = new(N);

    public PartialSortVsNative()
    {
        var random = new Random();
        for (int i = 0; i < N; i++)
            data.Add(random.Next());
    }

    [Benchmark(Baseline = true)]
    public List<int> Native() => data.OrderBy(x => x).Take(5).ToList();

    [Benchmark]
    public List<int> MoreLinqPartialSort() => MoreLinq.MoreEnumerable.PartialSort(data, 5).ToList();

    [Benchmark]
    public List<int> SuperLinqPartialSort() => SuperLinq.SuperEnumerable.PartialSort(data, 5).ToList();
}

results in

Method Mean Error StdDev Ratio RatioSD
Native 26.52 ms 0.524 ms 1.009 ms 1.00 0.00
MoreLinqPartialSort 17.13 ms 0.341 ms 0.379 ms 0.64 0.04
SuperLinqPartialSort 19.94 ms 0.396 ms 0.472 ms 0.75 0.04
MisinformedDNA commented 2 years ago

With added randomness for the take count:

public class PartialSortVsNative
{
    private const int N = 1_000_000;
    private readonly List<int> data = new(N);
    private readonly int TakeCount;

    public PartialSortVsNative()
    {
        var random = new Random();
        for (int i = 0; i < N; i++)
        {
            data.Add(random.Next());
        }
        TakeCount = random.Next(N);
    }

    [Benchmark(Baseline = true)]
    public List<int> Native() => data.OrderBy(x => x).Take(TakeCount).ToList();

    [Benchmark]
    public List<int> MoreLinqPartialSort() => MoreLinq.MoreEnumerable.PartialSort(data, TakeCount).ToList();

    [Benchmark]
    public List<int> SuperLinqPartialSort() => SuperLinq.SuperEnumerable.PartialSort(data, TakeCount).ToList();
}

// Summary

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19042.2006/20H2/October2020Update) AMD Ryzen 5 PRO 4650U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores .NET SDK=6.0.400 [Host] : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2 DefaultJob : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2

Method Mean Error StdDev Ratio RatioSD
Native 177.9 ms 4.91 ms 14.48 ms 1.00 0.00
MoreLinqPartialSort 839.3 ms 15.65 ms 15.37 ms 4.33 0.37
SuperLinqPartialSort 1,223.1 ms 24.08 ms 53.87 ms 6.94 0.59
viceroypenguin commented 2 years ago

Couple things:

  1. Nit: The proper way to do setup for a benchmark is to use a [GlobalSetup] public void Setup() method. That said, I'm not sure if there is a practical difference between the two other than explicitness.
  2. Making the TakeCount random is not an apples-apples comparison, because BenchmarkDotNet will run the setup once for each method, and then run the method thousands of times to generate statistics. This means that TakeCount could be 5 for Native and 50_000 for SuperLinqPartialSort, and vice versa on the next benchmark.
  3. For similar reasons, it is important to use a fixed seed to Random so that the same randomly generated values are used for each version of the test. It may be that one run is closer to sorted than the next run, which would affect variances between the different tests.

Updating your code accordingly, I get the following results:

// Summary

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22622.601) Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2

Job=.NET 6.0 Runtime=.NET 6.0

Method Mean Error StdDev Ratio RatioSD
Native 22.95 ms 0.377 ms 0.504 ms 1.00 0.00
MoreLinqPartialSort 16.86 ms 0.332 ms 0.310 ms 0.74 0.03
SuperLinqPartialSort 16.82 ms 0.330 ms 0.308 ms 0.74 0.02
viceroypenguin commented 2 years ago

PS: You can also look at my old benchmarks here

viceroypenguin commented 2 years ago

@MisinformedDNA I did some more research, and found a place where we were unnecessarily adding performance delays. Please see viceroypenguin/SuperLinq#67 for the fix, which is currently being released with SuperLinq v4.4.0. SuperLinq should now be as fast as MoreLinq in the int case, and faster in the string case.

theodorzoulias commented 2 years ago

My first benchmark:

@MisinformedDNA my expectation is that MoreLinq's List<T> top-based implementation should be quite performant when the int count is small (less than 50 or so), and it should become progressively slower when the int count increases. That's because it calls the List<T>.Insert method, which is an O(n) operation. On the contrary the SortedSet<T>.Add (in viceroypenguin's SortedSet<T> top-based implementation) is O(log n).