morelinq / MoreLINQ

Extensions to LINQ to Objects
https://morelinq.github.io/
Apache License 2.0
3.64k stars 409 forks source link

Force enumerable #191

Open dzmitry-lahoda opened 7 years ago

dzmitry-lahoda commented 7 years ago

One time I created ThreadStatic context for creating enumerable. I applied it to entities for lazy collection of these. So this context was lost in thread which popup lower in stack. I could change layers beneath which start new threads to do stuff, but too much work. It is easier to force evaluation. If it is list or array already - no need for evaluation. I cannot use empty ForEach as is to costly to invoke delegate. I could use Count() but need to enforce it with some code and ensure compiled will not throw this code away. So expecting method like Force() which returns IEnumerable but which is sure have been counted.

See related http://stackoverflow.com/questions/314100/something-better-than-toarray-to-force-enumeration-of-linq-output

dzmitry-lahoda commented 7 years ago

Like next(UPDATE: this is bad and will not work as expected):

       /// <summary>
        /// Forces the specified enumeration to be evaluated, if it is not yet.
        /// </summary>
        /// <param name="self">The self.</param>
        /// <returns>Same enumeration.</returns>
        [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining | System.Runtime.CompilerServices.MethodImplOptions.NoOptimization)]
        public static IEnumerable<T> Force<T>(this IEnumerable<T> self)
        {
            var count = self.Count();
            return self;
        }
dzmitry-lahoda commented 7 years ago

Seems taking count and use enumration again will evaluate delegates again. So next(UPDATE: runs in production and supports several use cases):

        /// <summary>
        /// Forces the specified enumeration to be evaluated, if it is not yet. On second pass no evaluation will happen.
        /// </summary>
        /// <typeparam name="T">The item type.</typeparam>
        /// <param name="self">The self.</param>
        /// <returns>Same enumeration if countable or new array.</returns>
        [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining | System.Runtime.CompilerServices.MethodImplOptions.NoOptimization)]
        public static IEnumerable<T> Force<T>(this IEnumerable<T> self)
        {
            if (self is ICollection<T> || self is IReadOnlyCollection<T> || self == null)
            {
                return self;
            }

            return self.ToArray();
        }
JamesFaix commented 7 years ago

I believe this is the same idea as the existing Consume operator.

/// <summary>
/// Completely consumes the given sequence. This method uses immediate execution,
/// and doesn't store any data during execution.
/// </summary>
/// <typeparam name="T">Element type of the sequence</typeparam>
/// <param name="source">Source to consume</param>

public static void Consume<T>(this IEnumerable<T> source)
{
    if (source == null) throw new ArgumentNullException("source");
    foreach (var element in source)
    {
    }
}
dzmitry-lahoda commented 7 years ago

Consume is very different:

We use Force in our code instead of direct ToList or ToArray. Force has many pluses, but few minuses.

JamesFaix commented 7 years ago

Ahh, I see it now. Yes, Force is a good idea for an operator.

dzmitry-lahoda commented 7 years ago

Lazy.Force<'T> Extension Method (F#)

Forces the execution of this value and returns its result. Same as System.Lazy.Value.
Mutual exclusion is used to prevent other threads from also computing the value.
dzmitry-lahoda commented 7 years ago

Force() feels different by functional and behavior of memoize #100. Memoize lazy, but cacheable with some performance overhead during run. Force is eager(evaluates all in current context and may be performance overhead if not evaluated before). Force makes runs most clean of custom code, while Memoize may behave differently if passed into 2 threads without proper synchronization (until cache mutual exclusion, like in Lazy, done). Force is good debugging tool for immediate window and scripts, while Memoize not.

atifaziz commented 7 years ago

@asd-and-Rizzo So to put it simply, Force is an eager version of Memoize or whole pass-through materialisation of the source. It will not only cause side-effects, if present, but more importantly also avoid further re-evaluation if the source is already known to be a materialised type. Correct? Bear in mind, however, a collection (read-only or not) does not imply a materialised source. It means that the count is eagerly available and you can append but it may still be lazily iterated (think remote).

dzmitry-lahoda commented 7 years ago

Yep. Correct. Having count, but still lazy is a problem. Need to think about it. I believe if collection is not read only then may pass it as is - it will work in most APIs. If collection is read only we need 'ToArray' it. So need to check many popular APIs of frameworks and libraries to evaluate how universal Force will be and how much of not needed conversions it will safe. As of now it safes coders from going procedural code after ToList.

leandromoh commented 6 years ago

How about this? This resolves the scenario where collection is not materialised.

        static IEnumerable<T> Force<T>(this IEnumerable<T> source)
        {
            switch (source)
            {
                case T[] _:
                case List<T> _:
                case Stack<T> _:
                case Queue<T> _:
                case HashSet<T> _:
                case null:
                    return source;
                default:
                    return source.ToArray();
            }
        }
dzmitry-lahoda commented 6 years ago

Would it work with Immutable and other(may be custom collections)? I doubt it possible to have all these as to many dlls are needed from nuget on .net core. It it probable to overload on (this Xyz source) but only when such source directly passed. And LINQ like extensions should be somewhat generic. So may be will work, may be not. Both my method and propose are fine for me (for all my usages).

leandromoh commented 6 years ago

Would it work with Immutable and other(may be custom collections)?

Yes, the function will return a materialised sequence, and will not evaluate it again. The code does not need be optmized for all scenarios, but it surely must work in all scenarios.

I think we should optmized the most common scenario, that is, the collections in .net framework. If someone pass a custom (materialised) collection, it will apply ToArray() once, and even this may be "cheap", since the itens are already in memory.

If we are really concerned about XYZ optmization, we can create other overload where user pass a predicate with additional validations.

dzmitry-lahoda commented 6 years ago

In our project(other than I have found Force useful, first for server, current is desktop) people tend to leave several ToArray and ToList[1] in code of same method. These are eager - i.e. several times per method we do reallocated arrays. Force, when checks for array and list, avoids that issue.


            public TResult[] ToArray()
            {
                var builder = new LargeArrayBuilder<TResult>(initialize: true);

                foreach (TSource item in _source)
                {
                    builder.Add(_selector(item));
                }

                return builder.ToArray();
            }

            public List<TResult> ToList()
            {
                var list = new List<TResult>();

                foreach (TSource item in _source)
                {
                    list.Add(_selector(item));
                }

                return list;
            }

[1] https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,317

dzmitry-lahoda commented 5 years ago

In some modern scenarios ToSpan may (sometimes) server similar purposes of Force, but no always.

leandromoh commented 5 years ago

@atifaziz, i would like to submit a PR about this one (in the vain of my proposed prototype). Can I go ahead?

atifaziz commented 5 years ago

@leandromoh Unfortunately, I have no time to spare for this right now. There are too many PRs open already (and no help) and I am trying to wrap up the next release. This is not a complicated method and I believe people can get by meanwhile with having the implementation embedded in their projects.

dzmitry-lahoda commented 3 years ago

@atifaziz may be give pr rigths for some contrib except you?