morelinq / MoreLINQ

Extensions to LINQ to Objects
https://morelinq.github.io/
Apache License 2.0
3.7k stars 415 forks source link

Except and Intersect overloads that take delegates #80

Open atifaziz opened 9 years ago

atifaziz commented 9 years ago

Originally reported on Google Code with ID 80

Useful when you have one sequence, and you want to get the except/intersect from another
sequence based on a specific projection (like DistinctBy)

Originally sourced from here: 
http://akshayluther.com/2009/08/14/improving-linq-except-and-intersect/

Additional overloads added where the two sequences have different types added - since
the Func doesn't really care if they're the same or not and therefore you can just
check equality inside.

Sample usage:
var toRemove = existingResults.Except(newResults, (x, y) => x.EntityId == y.EntityId
&& x.EntityType == y.EntityType);

Code: 
 public static IEnumerable<TSource> Except<TSource>(this IEnumerable<TSource> first,
IEnumerable<TSource> second, Func<TSource, TSource, bool> comparer)
        {
            return first.Where(x => second.Count(y => comparer(x, y)) == 0);
        }

        public static IEnumerable<TSource> Intersect<TSource>(this IEnumerable<TSource>
first, IEnumerable<TSource> second, Func<TSource, TSource, bool> comparer)
        {
            return first.Where(x => second.Count(y => comparer(x, y)) == 1);
        }

        public static IEnumerable<TFirstSource> Except<TFirstSource, TSecondSource>(this
IEnumerable<TFirstSource> first, IEnumerable<TSecondSource> second, Func<TFirstSource,
TSecondSource, bool> comparer)
        {
            return first.Where(x => second.Count(y => comparer(x, y)) == 0);
        }

        public static IEnumerable<TFirstSource> Intersect<TFirstSource, TSecondSource>(this
IEnumerable<TFirstSource> first, IEnumerable<TSecondSource> second, Func<TFirstSource,
TSecondSource, bool> comparer)
        {
            return first.Where(x => second.Count(y => comparer(x, y)) == 1);
        }

Reported by agrath on 2013-07-09 22:34:29

JamesFaix commented 8 years ago

This could be improved with something like the HashSet implementation of ExceptBy, if HashSet didn't require an IEqualityComparer. Is there anything like Hashset that takes an Func<T, T, bool> as a comparer parameter?

http://stackoverflow.com/questions/881438/hashset-constructor-with-custom-iequalitycompare-defined-by-lambda

public static class HashSetDelegate
{
    public static HashSet<T> Create<T>(Func<T, T, bool> func)
    {
    return new HashSet<T>(new FuncIEqualityComparerAdapter<T>(func));
    }

    private class FuncIEqualityComparerAdapter<U> : IEqualityComparer<U>
    {
    private Func<U, U, bool> func;
    public FuncIEqualityComparerAdapter(Func<U, U, bool> func)
    {
        this.func = func;
    }

    public bool Equals(U a, U b)
    {
        return func(a, b);
    }

    public int GetHashCode(U obj)
    {
        return 0;
    }  

    }
}

public class HashSetTest
{
    public static void Main()
    {
    HashSet<string> s = HashSetDelegate.Create((string a, string b) => string.Compare(a, b, true) == 0);
    }
}
JamesFaix commented 7 years ago

Here is a pretty simple implementation I came up with for these methods. I've left out argument validation to keep code samples brief. I also ended each method name in "Delegate" to distinguish them from the similar methods in PR #169, but I do not think these are great final names.

public static IEnumerable<TSource1> IntersectByDelegate<TSource1, TSource2>(
    this IEnumerable<TSource1> first,
    IEnumerable<TSource2> second,
    Func<TSource1, TSource2, bool> comparer) {

    var keys = second.ToArray();
    return first.Where(f => keys.Any(s => comparer(f, s)));
}

public static IEnumerable<TSource1> ExceptByDelegate<TSource1, TSource2>(
    this IEnumerable<TSource1> first,
    IEnumerable<TSource2> second,
    Func<TSource1, TSource2, bool> comparer) {

    var keys = second.ToArray();
    return first.Where(f => keys.All(s => comparer(f, s)));
}

One issue I see is that, without adding more delegate parameters, there isn't any "distinctness" being enforced. This would require a more complicated implementation, which may also require another version of DistinctBy as part of it.

public static IEnumerable<TSource1> ExceptByDelegate<TSource1, TSource2>(
    this IEnumerable<TSource1> first,
    IEnumerable<TSource2> second,
    Func<TSource1, TSource2, bool> comparer,
    Func<TSource1, TSource1, bool> distinctnessComparer1,
    Func<TSource2, TSource2, bool> distinctnessComparer2) {

    var keys = second
        .DistinctBy(distinctnessComparer2)
        .ToArray();

    return first
        .Where(f => keys.All(s => comparer(f, s)))
        .DistinctBy(distinctnessComparer1);
}

public static IEnumerable<TSource> DistinctByDelegate(
    this IEnumerable<TSource> sequence,
    Func<TSource, TSource, bool> comparer) {

    var queue = new Queue<TSource>(sequence);
    while (queue.Any()) {
        var item = queue.Pop();
        if (queue.All(x => !comparer(item, x)))
            yield return item;
    }
}

For most use cases, the methods in PR #169 can accomplish what ExceptByDelegate and IntersectByDelegate methods are for, but there are a few benefits here.

So, here are a few opinion questions:

  1. Would ExceptByDelegate and IntersectByDelegate provide enough value to add to the API?
  2. Should there be a fast/non-distinct or a slow/distinct implementation, or both?
  3. If there is a non-distinct implementation, should the names be changed from Except and Intersect to something that sounds less like a set operation?
  4. Should DistinctByDelegate be added to the public API? Should that be a separate issue?