morelinq / MoreLINQ

Extensions to LINQ to Objects
https://morelinq.github.io/
Apache License 2.0
3.7k stars 415 forks source link

Add identity method to reduce JIT-ing #880

Closed viceroypenguin closed 1 year ago

viceroypenguin commented 2 years ago

Currently, the C# compiler does not de-dupe methods, including identity methods, such as x => x. This means each instance of x => x is compiled as a separate method; and each method must be JIT to asm separately, and must be done so for each type separately.

Adding an Identity method improves JIT significantly:

Until .net7.0 + tiered PGO, these methods will not get inlined because they are accessed via delegate. This means they also take additional space in memory due to the translated code.

codecov[bot] commented 2 years ago

Codecov Report

Merging #880 (6ecbcbf) into master (b31c7fd) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #880   +/-   ##
=======================================
  Coverage   92.38%   92.39%           
=======================================
  Files         110      111    +1     
  Lines        3441     3443    +2     
  Branches     1020     1021    +1     
=======================================
+ Hits         3179     3181    +2     
  Misses        200      200           
  Partials       62       62           
Impacted Files Coverage Δ
MoreLinq/Batch.cs 94.33% <100.00%> (ø)
MoreLinq/GroupAdjacent.cs 98.52% <100.00%> (ø)
MoreLinq/IdFn.cs 100.00% <100.00%> (ø)
MoreLinq/OrderedMerge.cs 93.84% <100.00%> (ø)
MoreLinq/Rank.cs 96.15% <100.00%> (ø)
MoreLinq/Split.cs 89.70% <100.00%> (ø)
MoreLinq/Partition.cs 98.27% <0.00%> (+0.03%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

viceroypenguin commented 2 years ago

I don't see the value in adding this because it's not improving anything. It's less ergonomic/succinct (or more verbose) than just a lambda (x => x) and it's not removing duplication of any “methods” per the PR subject line. What's more, it doesn't work with anonymous types.

Selected Method:

public static IEnumerable<T> OrderedMerge<T, TKey>(
    this IEnumerable<T> first,
    IEnumerable<T> second,
    Func<T, TKey> keySelector)
{
    return OrderedMerge(first, second, keySelector, Identity<T>, Identity<T>, (a, _) => a, null);
}

Decompile of .OrderedMerge() with lambdas:

[Extension]
public static IEnumerable<T> OrderedMerge<[System.Runtime.CompilerServices.Nullable(2)] T, [System.Runtime.CompilerServices.Nullable(2)] TKey>(IEnumerable<T> first, IEnumerable<T> second, Func<T, TKey> keySelector)
{
    return OrderedMerge(first, second, keySelector, <>c__156<T, TKey>.<>9__156_0 ?? (<>c__156<T, TKey>.<>9__156_0 = new Func<T, T>(<>c__156<T, TKey>.<>9.<OrderedMerge>b__156_0)), <>c__156<T, TKey>.<>9__156_1 ?? (<>c__156<T, TKey>.<>9__156_1 = new Func<T, T>(<>c__156<T, TKey>.<>9.<OrderedMerge>b__156_1)), <>c__156<T, TKey>.<>9__156_2 ?? (<>c__156<T, TKey>.<>9__156_2 = new Func<T, T, T>(<>c__156<T, TKey>.<>9.<OrderedMerge>b__156_2)), null);
}

Decompile of .OrderedMerge() with Identity<T>:

[Extension]
public static IEnumerable<T> OrderedMerge<[System.Runtime.CompilerServices.Nullable(2)] T, [System.Runtime.CompilerServices.Nullable(2)] TKey>(IEnumerable<T> first, IEnumerable<T> second, Func<T, TKey> keySelector)
{
    return OrderedMerge(first, second, keySelector, <OrderedMerge>O__157_0<T, TKey>.<0>__Identity ?? (<OrderedMerge>O__157_0<T, TKey>.<0>__Identity = new Func<T, T>(Identity)), <OrderedMerge>O__157_0<T, TKey>.<0>__Identity ?? (<OrderedMerge>O__157_0<T, TKey>.<0>__Identity = new Func<T, T>(Identity)), <>c__157<T, TKey>.<>9__157_0 ?? (<>c__157<T, TKey>.<>9__157_0 = new Func<T, T, T>(<>c__157<T, TKey>.<>9.<OrderedMerge>b__157_0)), null);
}

In the original version, the two identity lambdas are compiled separately (<>c__156<T, TKey>.<>9.<OrderedMerge>b__156_0 and <>c__156<T, TKey>.<>9.<OrderedMerge>b__156_1). In the new version, only one method is compiled: Identity<T>. This duplication is increased for each method that also uses the same identity method.

This improves JIT significantly:

Until .net7.0 + tiered PGO, these methods will not get inlined because they are accessed via delegate. This means they also take additional space in memory due to the translated code.

Also, it works fine with anonymous types - see LinqPAD attachment here: image

atifaziz commented 2 years ago

@viceroypenguin Thanks for the detailed explanation. I understand now the savings you're trying to make with this and it would have been great to have the justification in the initial description. I'll come back to you on this, hopefully before the week is over.

atifaziz commented 2 years ago

Also, it works fine with anonymous types - see LinqPAD attachment here:

My main concern was as a method group within a query where you're expecting to use Identity (and not as an argument to the method as in your LINQPad example), but it works fine:

var map =
    Enumerable.Range(1, 10)
              .Select(x => new { X = x, Y = x * 2 })
              .ToDictionary(x => x.X, Identity);

I was somewhat misled during my initial review by the explicit type annotations like Identity<TSource> and Identity<IEnumerable<TSource>>, where you still specify the generic type parameter, but coming back to it, those are all redundant.

viceroypenguin commented 2 years ago

I was somewhat misled during my initial review by the explicit type annotations like Identity<TSource> and Identity<IEnumerable<TSource>>, where you still specify the generic type parameter, but coming back to it, those are all redundant.

Ah, right. Totally missed that. Amazing how powerful the typing system is sometimes. :)