Open atifaziz opened 6 years ago
I have some questions:
Count
is not a multiple of columns
. Most MoreLINQ operators are stricter.What would be some use cases for this?
Some environments, like Excel, use 2D arrays for data transfer. While multi-dimensional arrays in .NET have a built-in implementation for iteration as a sequence, there is no method to go the other way.
Why is this operator permissive instead of strict? Most MoreLINQ operators are stricter.
We can debate the pros and cons but as you said, it's most, not all. I've also had second thoughts about operators where we decided to be strict but where the argument validation is done later, during iteration. In hindsight, the behaviour should have been left undefined. We have two options. Make it strict or add an overload where a function that, given a row and a column, provides a filler. If someone wants strict semantics, they can throw from that function. Thoughts?
What would be some use cases for this?
Some environments, like Excel, use 2D arrays for data transfer. While multi-dimensional arrays in .NET have a built-in implementation for iteration as a sequence, there is no method to go the other way.
OK, but why would I be operating on a sequence that is really a matrix? Or put it another way, how do I end up with a matrix represented as a single sequence?
We have two options. Make it strict or add an overload where a function that, given a row and column, provides a filler. If someone wants strict semantics, they can throw from that function. Thoughts?
I think the overload with the filler is a good compromise. I would still leave the default to be a throwing version, though.
I've also had second thoughts about operators where we decided to be strict but where the argument exception is thrown later, during iteration.
This operator is not delayed so this problem does not appear here.
Further thought makes me think this is mixing two separate problems: break a sequence into parts, and then transforming that into a 2D array. We already have operators for the latterfirst, so I guess we could have only the second part:
public static T[,] To2DArray(IEnumerable<IEnumerable<T>> rows, int columns);
This can be paired with Batch
to achieve the same as the originally proposed method.
Thoughts?
OK, but why would I be operating on a sequence that is really a matrix? Or put it another way, how do I end up with a matrix represented as a single sequence?
Cheating… 😊
var data =
from matrix in new[]
{
new object[,]
{
{ new DateTime(2018, 1, 5), 87.66, 88.41, 87.43, 88.19, 87.40, 23_407_100 },
{ new DateTime(2018, 1, 4), 86.59, 87.66, 86.57, 87.11, 86.33, 21_912_000 },
{ new DateTime(2018, 1, 3), 86.06, 86.51, 85.97, 86.35, 85.58, 26_061_400 },
{ new DateTime(2018, 1, 2), 86.13, 86.31, 85.50, 85.95, 85.18, 22_483_800 },
}
}
from row in ((IEnumerable) matrix).Cast<object>().Batch(7)
select row.Fold((date, open, hi, lo, close, _, vol) => new
{
Date = (DateTime) date,
Open = (double) open,
High = (double) hi,
Low = (double) lo,
Close = (double) close,
Volume = (int) vol,
});
This operator is not delayed so this problem does not appear here.
True except I am not sure that eager or lazy evaluation really makes a difference. Argument checking should take place before the main operation of the method. Anything during the operation really qualifies as undefined behaviour or should surface as a custom exception if it's important to distinctly communicate the nature of the error.
Further thought makes me think this is mixing two separate problems: break a sequence into parts, and then transforming that into a 2D array…Thoughts?
That's an interesting way to look at it. So you mean, like this?
public static T[,] ToArray2D<T>(this IEnumerable<IEnumerable<T>> source, int columns)
{
if (columns == 0)
return new T[0, 0];
var rows = source.ToArray();
var result = new T[rows.Length, columns];
var y = 0;
foreach (var row in rows)
{
var x = 0;
foreach (var cell in row)
result[y, x++] = cell;
y++;
}
return result;
}
It does introduce a couple of corner cases one needs to think about when the batch size doesn't match the column count. When batch size is smaller, as in Enumerable.Range(1, 10).Batch(2).ToArray2D(3)
, you get back:
new int[,]
{
{ 1, 2, 0 },
{ 3, 4, 0 },
{ 5, 6, 0 },
{ 7, 8, 0 },
{ 9, 10, 0 },
}
If it's bigger, the implementation fails with an IndexOutOfRangeException
. I guess this is correct if you want to be strict but then the case of short rows should also throw.
I propose to add an extension that allows creation of a two-dimensional array given a sequence and a count of columns.
Prototype
Examples
Example 1 where
columns
= 0will be equivalent to
new int[0, 0]
.Example 2 where
columns
= 2will be equivalent to:
Example 3 where
columns
= 3will be equivalent to: