Open atifaziz opened 4 years ago
Based on work so far in PR #753, I feel it's best to rename the extension to SpillHead
. The simplest overload will spill the first element of the sequence to the second and beyond so in the case of one element representing the head, span seems confusing. However, head works well whether you have a single element or several making up the “head” of the sequence.
Is it essentially this, but working in constant space?
public static IEnumerable<R>
SpillSpan<T, P, R>(
this IEnumerable<T> source,
Func<T, bool> predicate,
Func<T, P> prefixMapping,
Func<P, T, R> resultSelector)
{
var (prefix, rest) = source.Span(predicate);
var mappedPrefix = prefixMapping(prefix);
return rest.Select(e => resultSelector(mappedPrefix, e));
}
// applied to a predicate `predicate` and an enumerable `source`, returns a tuple where first element is longest prefix (possibly empty) of `source` of elements that satisfy `predicate` and second element is the remainder of the enumerable
// >>> span (< 3) [1,2,3,4,1,2,3,4]
// ([1,2],[3,4,1,2,3,4])
public static (IReadOnlyCollection<T> Prefix, IReadOnlyCollection<T> Rest) Span<T>(this IEnumerable<T> source, Func<T, bool> predicate);
The A empty
+Func<M, A> seeder
+Func<A, M, A> accumulator
+Func<A, H> headerSelector
logic can be made external and encapsulated into Func<T, P> prefixMapping
. chooser
handles both predication and mapping, so the mapping part can be moved into prefixMapping
too.
Not quite like span
/Parition
, because here the proposal is to consume in a streaming fashion. So while the head/header is collected and projected to help process the remainder, the remainder of the sequence (assuming they are the data rows) is streamed. If you have billions of rows, this operator will lazily only consume what's needed. It can be combined with other streaming operators to avoid committing the entire source to memory.
I propose to add an extension that takes one or more elements at the head of a sequence and spills a projection to remaining elements of the sequence.
Sometimes you have a sequence where the initial element(s) contains information about the processing of the rest of the sequence. A typical example is a table (think CSV) where the table is composed of rows; where the first row is a header and the remaining the data rows. Processing such a sequence should only generate a projection of the data rows.
The signature would be as follows:
The
chooser
identifies header elements. Data rows commence as soon as it returns(false, _)
for aT
. The header elements are accumulated viaaccumulator
and theseeder
is used to seed the accumulation with the initial header element. Theempty
value is used for the headless case. TheheaderSelector
function is used to create a single projection out of the accumulated header elements and which is subsequently paired with or spilled to remaining elements.I propose to add overloads for simpler cases.
SpillSpan
should never throw an exception. If the user wants to ban the headless case, he/she can throw inheaderSelector
upon receiving theempty
value.Example
Output: