Unflatten function - Githubissues

leandromoh commented 6 years ago

Flatten operator takes a sequence containing arbitrarily-nested sequences and return a flatten one but sometimes we need to do the inverse operation, that is, given a flatten sequence, we need to create branches from the elements. So I propose the UnFlatten operator.

Signature:

static IEnumerable<object> UnFlatten<T, TResult>(this
    IEnumerable<T> source,
    Func<T, bool> rootSelector,
    Func<T, IEnumerable<T>> childrenSelector,
    Func<T, IEnumerable<object>, TResult> resultSelector)

Examples:

1) With Recursive Types

public class Category
{
    public string Name;
    public int Parent;
    public int Id;
    public IEnumerable<Category> SubCategories;
}

static void Print (IEnumerable<Category> source, int deep)
{
    foreach(var i in source )
    {
        Console.WriteLine(new string('-', deep * 2) + i.Name);

        Print(i.SubCategories, deep + 1);
    }
}

var categories = new[]
{
    new Category { Id = 1, Name = "Music", Parent = 0, },
    new Category { Id = 2, Name = "Pop", Parent = 1, },
    new Category { Id = 3, Name = "Rock", Parent = 1, },
    new Category { Id = 4, Name = "Animal", Parent = 0, },
    new Category { Id = 5, Name = "Feline", Parent = 4, },
    new Category { Id = 6, Name = "Bird", Parent = 4, },
    new Category { Id = 7, Name = "Eagle", Parent = 6, },
    new Category { Id = 8, Name = "Cat", Parent = 5, },
    new Category { Id = 9, Name = "Metal", Parent = 3, },
    new Category { Id = 10, Name = "Heavy Metal", Parent = 9, },
    new Category { Id = 11, Name = "Lion", Parent = 5, },
    new Category { Id = 12, Name = "Hard Rock", Parent = 3, },
    new Category { Id = 13, Name = "Thrash Metal", Parent = 9, },
    new Category { Id = -1, Name = "Broken", Parent = 99, },
};

var result = categories.UnFlatten(
node => node.Parent == 0,
node => categories.Where(y => y.Parent == node.Id), 
(parent, children) => {
    parent.SubCategories = children.Cast<Category>();
    return parent;
});

Print(result.Cast<Category>(), 0);

outputs:

Music
--Pop
--Rock
----Metal
------Heavy Metal
------Thrash Metal
----Hard Rock
Animal
--Feline
----Cat
----Lion
--Bird
----Eagle

2) With Non-Recursive Types

static void Main(string[] args)
{
    var fligths = new[]
    {
        new { Departure = "A", Arrival = "B", },
        new { Departure = "D", Arrival = "E", },
        new { Departure = "B", Arrival = "C", },
        new { Departure = "C", Arrival = "D", },
        new { Departure = "C", Arrival = "E", },
        new { Departure = "E", Arrival = "F", },
        new { Departure = "G", Arrival = "H", },
    };

    var result = fligths.UnFlatten(
        x => x.Departure == "A",
        x => fligths.Where(y => y.Departure == x.Arrival),
        (parent, children) => Tuple.Create(parent, children));

    Print(result, 0);
}

static void Print(IEnumerable<object> source, int deep)
{
    foreach (dynamic i in source)
    {
        Console.WriteLine(new string('-', deep * 2) + i.Item1.Departure);

        Print(i.Item2, deep + 1);
    }
}

outputs:

A
--B
----C
------D
--------E
----C
------E

Workaround

None.

Prototype

static IEnumerable<object> UnFlatten<T, TResult>(this
    IEnumerable<T> source,
    Func<T, bool> rootSelector,
    Func<T, IEnumerable<T>> childrenSelector,
    Func<T, IEnumerable<object>, TResult> resultSelector)
{
    foreach (var i in rootSelector == null 
                              ? source
                              : source.Where(rootSelector))
    {
        var children = childrenSelector(i).UnFlatten(null, childrenSelector, resultSelector);
        yield return resultSelector(i, children);
    }
}

I would like to submit a PR for this. What do you think?

atifaziz commented 6 years ago

TraverseDepthFirst can help with this to a large extent. For example, your recursive types example can be written like this:

var example1 =
    from c in categories
    where c.Parent == 0
    select c into root
    from e in
        MoreEnumerable.TraverseDepthFirst(
            (Category: root, Depth: 0),
            p => from c in categories
                 where c.Parent == p.Category.Id
                 select (Category: c, Depth: p.Depth + 1))
    select new string('-', e.Depth * 2) + e.Category.Name;

foreach (var e in example1)
    Console.WriteLine(e);

And the non-recursive one like so:

var example2 =
    from f in fligths
    where f.Departure == "A"
    select f into a
    from e in
        MoreEnumerable.TraverseDepthFirst(
            (Flight: a, Depth: 0),
            p => from f in fligths
                 where f.Departure == p.Flight.Arrival
                 select (Flight: f, Depth: p.Depth + 1))
    select new string('-', e.Depth * 2) + e.Flight.Departure;

foreach (var e in example2)
    Console.WriteLine(e);

So I'd argue that TraverseDepthFirst is a workaround, though I recognise that I'm cheating here somewhat by only looking at the printed output of each example (which is a flat list of lines again).

To go from a flat list back to hierarchy will require more thinking and work. The signature:

static IEnumerable<object> UnFlatten<T, TResult>(this
    IEnumerable<T> source,
    Func<T, bool> rootSelector,
    Func<T, IEnumerable<T>> childrenSelector,
    Func<T, IEnumerable<object>, TResult> resultSelector)

has several problems:

rootSelector is really a predicate so should be called rootPredicate but this is just a minor naming issue.
The childrenSelector predicate says that T should already know about its children but it doesn't. In your examples you re-traverse the source sequence but since it's not available to the function, you rely on lambda closures to provide it as context. The signature should really be Func<T, IEnumerable<T>, IEnumerable<T>>, although this requires a full scan on each call to find the children and so will be inefficient.
Why return a sequence of objects?
Likewise, why is the second argument (tree?) to resultSelector a sequence of objects?

atifaziz commented 6 years ago

I'd also like to add that this method isn't the opposite of Flatten even if it feels so. While Flatten traverses a nested sequence in depth-first order and lays them out flat, what I see being proposed here is dereferencing of a flat list linked via references into a tree. Correct me if I'm misreading.

leandromoh commented 6 years ago

I'd also like to add that this method isn't the opposite of Flatten even if it feels so. While Flatten traverses a nested sequence in depth-first order and lays them out flat, what I see being proposed here is dereferencing of a flat list linked via references into a tree. Correct me if I'm misreading.

Dereferencing of a flat list linked via references into a tree is just a specific usage of this operator. Here is an example that looks more with Unflatten.

var numbers = MoreEnumerable.Sequence(2, 5, 2)
                            .Select(x => (decimal) x);

IEnumerable result = 

numbers.UnFlatten(_ => true,
                    n =>
                        n.ToString().Length == 1 || n.ToString().Split(',')[1].Length < 3
                            ? new[] { n - (n / 2), n + (n / 2) }
                            : Enumerable.Empty<decimal>()
                    ,
                    (parent, children) => children.Prepend(parent));

PrintObject(result, 0);

static void PrintObject(IEnumerable source, int deep)
{
    foreach (object i in source)
    {
        if (i is IEnumerable)
            PrintObject((IEnumerable) i, deep + 1);
        else
            Console.WriteLine(new string('-', deep * 2) + i);
    }
}

outputs

--2
----1
------0,5
--------0,25
----------0,125
----------0,375
--------0,75
----------0,375
----------1,125
------1,5
--------0,75
----------0,375
----------1,125
--------2,25
----------1,125
----------3,375
----3
------1,5
--------0,75
----------0,375
----------1,125
--------2,25
----------1,125
----------3,375
------4,5
--------2,25
----------1,125
----------3,375
--------6,75
----------3,375
----------10,125
--4
----2
------1
--------0,5
----------0,25
------------0,125
------------0,375
----------0,75
------------0,375
------------1,125
--------1,5
----------0,75
------------0,375
------------1,125
----------2,25
------------1,125
------------3,375
------3
--------1,5
----------0,75
------------0,375
------------1,125
----------2,25
------------1,125
------------3,375
--------4,5
----------2,25
------------1,125
------------3,375
----------6,75
------------3,375
------------10,125
----6
------3
--------1,5
----------0,75
------------0,375
------------1,125
----------2,25
------------1,125
------------3,375
--------4,5
----------2,25
------------1,125
------------3,375
----------6,75
------------3,375
------------10,125
------9
--------4,5
----------2,25
------------1,125
------------3,375
----------6,75
------------3,375
------------10,125
--------13,5
----------6,75
------------3,375
------------10,125
----------20,25
------------10,125
------------30,375

leandromoh commented 6 years ago

To go from a flat list back to hierarchy will require more thinking and work. The signature has several problems:

rootSelector is really a predicate so should be called rootPredicate but this is just a minor naming issue.

The childrenSelector predicate says that T should already know about its children but it doesn't. In your examples you re-traverse the source sequence but since it's not available to the function, you rely on lambda closures to provide it as context. The signature should really be Func<T, IEnumerable, IEnumerable>, although this requires a full scan on each call to find the children and so will be inefficient.

Why return a sequence of objects?

Likewise, why is the second argument (tree?) to resultSelector a sequence of objects?

point 1: okay, make sense.

point 2: I see. There are cases where T does not know about its children (like you pointed) and cases where it does (like in my example with numbers, where we generate the children). Anyway, user can materialize source before call Unflatten to avoid many interations if one consider it expensive.

point 3: I tried

        static IEnumerable<TResult> UnFlatten<T, TResult>(this
    IEnumerable<T> source,
    Func<T, bool> rootSelector,
    Func<T, IEnumerable<T>> childrenSelector,
    Func<T, IEnumerable<TResult>, TResult> resultSelector)

but except when the recursivity is inside the type (like Category), TResult will become recursive. In my example with non-recursive types, if I had used a class named Flight instead of an anonymous, the type signature would be: IEnumerable<Tuple<Flight, IEnumerable<Tuple<Flight, IEnumerable<Tuple<Flight, ...>>>>>> and therefore un-writeble. As well Flatten can handle with arbitrarily depth of nested sequences, Unflatten also does. Also, compiler can not infer TResult type.

point 4: this parameter has the same type of the return type of Unflatten, since the operator is recursive.

leandromoh commented 6 years ago

Can I Go ahead with PR considering point 1and 2?

atifaziz commented 6 years ago

I am afraid we have not fleshed this out enough. I still have to come back to you with some comments on your last reply.

Denis535 commented 3 years ago

Here is my Unflatten for 2-level hierarchy.

    public static IEnumerable<(T? Key, T[] Children)> Unflatten<T>(this IEnumerable<T> source, Func<T, bool> predicate) {
        var key = default( T );
        var hasKey = false;
        var children = new List<T>();
        foreach (var item in source) {
            if (!predicate( item )) {
                children.Add( item );
            } else {
                if (hasKey || children.Any()) yield return (key, children.ToArray());
                key = item;
                hasKey = true;
                children.Clear();
            }
        }
        if (hasKey || children.Any()) yield return (key, children.ToArray());
    }

morelinq / MoreLINQ

Unflatten function #477

Signature:

Examples:

1) With Recursive Types

2) With Non-Recursive Types

Workaround

Prototype