Open GoogleCodeExporter opened 9 years ago
"overrides" in my original note should be "override". Didn't mean to imply
existing
methods should be replaced, but rather just a new one should be added.
Original comment by CAmmerman@gmail.com
on 16 Feb 2009 at 3:54
I can't see how you get away without the overload taking a strategy - you still
need
to differentiate between Pad, Truncate and Fail. How would you make the above
signature handle truncation and failure?
Original comment by jonathan.skeet
on 16 Feb 2009 at 7:37
My thought was that adding the pad argument to the existing methods would
overcomplicate the signature because it's only necessary for one of the three
strategies. The signature I specified above would be a separate overload from
the
ones already there.
Original comment by CAmmerman@gmail.com
on 16 Feb 2009 at 1:14
When I originally submitted the Zip implementation, I intentionally left a
comment
that referenced a similar discussion regarding zip in Python 3000 as food for
thought. Since the comment has been lost in translation, :) I'll re-iterate it
here:
[Python-3000] have zip() raise exception for sequences of different lengths
http://mail.python.org/pipermail/python-3000/2006-August/003338.html
The discussion is very interesting and worth reading up on before settling here.
The current use of an enumeration (ImbalancedZipStartegy) to express the
required
behavior seems awkward because it's not something you want to be able to vary
externally, which is usually why you would make a parameter or argument. Plus,
it's a
little cumbersome to write and read back in code and not very LINQ-ish in
spirit. The
enumeration could certainly be used internally for sharing the implementation
logic
but the public API should use distinct and clear operator names that imply the
strategy. For example:
- EquiZip:
Sequences must be equal in length otherwise throw error
- Zip:
Truncates to shorter of the two sequences
- ZipLongest:
Uses longer of the two sequences while padding shorter of the two
Original comment by azizatif
on 16 Feb 2009 at 5:19
Original comment by azizatif
on 16 Feb 2009 at 11:05
I agree with Aziz that using the strategy enum seems like it might be more
cumbersome
with little added value for the API consumer... This doesn't mean we wouldn't
want to
expose it or use it internally, just that simpler methods might be warranted as
well.
I think we could conceivably condense all behavior into two methods both named
Zip...
One method would either truncate or throw, if the sequences aren't of the same
length:
void Zip(this IEnumerable<T1> src, IEnumerable<T2> other, Boolean throwIfUneven)
The other would take generators for each sequence in case one should be
shorter, and
if the generator is null, use the "default" for that type:
void Zip(this IEnumerable<T1> src, IEnumerable<T2> other, Func<T1> srcPadGen,
Func<T2> otherPadGen)
Original comment by CAmmerman@gmail.com
on 17 Feb 2009 at 12:35
I removed the comment about Python on the grounds that I'd implemented the
optional
functionality, and referring to Python probably wouldn't actually help our
users :)
Personally I still think the enumeration is the simplest way to go - using a
boolean
for throw or not feels like the kind of thing that reduces readability: when
looking
at the code in 6 months time, you'd be pretty much *required* to check what
that
parameter was.
I can see the benefit in the padding generators in that it allows a non-default
padding, but that's all - and I suspect you'd rarely want to *actually* express
both,
because you'd probably know which is going to be longer.
Don't forget that in the current setup the client doesn't *have* to specify the
enum
at all - it defaults to Truncate.
Original comment by jonathan.skeet
on 17 Feb 2009 at 6:27
Fair point on readability for the boolean argument.
It's anecdotal, but in my personal use I far more often want to pad the short
side
than truncate the long side. I use the generators all the time. I have no
qualms
about making truncate the default behavior, but I think the generator versions
would
also get a good amount of use.
Especially when working through the IEnumerable interface rather than directly
with
List, Array, etc., it's not at all uncommon to not know which side is longer
before
zipping. In fact, needing to know ahead of time I think sort of defeats the
purpose
of the deferred execution.
If we want to support padding generators, we'd pretty much have to require both
to be
provided (at least filled in with a null to indicate default), because the way
generics works would prevent us from specifying one overload for each side.
(They'd
conflict with each other when the element types are the same.)
Original comment by CAmmerman@gmail.com
on 17 Feb 2009 at 2:34
[deleted comment]
> referring to Python probably wouldn't actually help our users :)
I was more thinking of MoreLINQ developers and maintainers until we polish
(docs) for
1.0. :) Meanwhile, is the MoreLINQ audience Mort or Elvis? ;)
> Personally I still think the enumeration is the simplest way to go
We risk going on forever with personal preferences and subjectivity. The right
solution may be simple, but we may not be there yet. :)
The argument is no longer whether we should provide the functionality or not,
but
rather how to expose it and using what signature(s). There's already agreement
that a
Boolean definitely reduces readability. The next thing that should be avoided
is an
explosion of overloads to do and mean everything. As for the
ImbalancedZipStrategy,
I've pointed out before that making the strategy an argument of the operator
via an
enumeration (or even polymorphic type) would be interesting if the problem that
needed solving was that the strategy is a variable of the run-time. I think we
can
also agree that's not the case here. If that's not convincing by itself, then
there
is a very subtle yet major idea behind extension methods and LINQ that the
enumeration works against. If the enumerable source type wants to provide an
optimization for an operator, it can do so by providing a method by the same
name and
signature as the operator in question. If we use an enumeration that comes from
MoreLINQ then the other type has to take in a dependency that won't be looked
upon
lightly. If we take the approach of embodying the strategy in the name
(assuming this
is a compile-time decision), then we have simple signatures with Zip, EquiZip
and
ZipLongest being distinct and clear names. Now, if I have a type called
SuperDuperList<T> that wants to provide an optimization for Zip, EquiZip and
ZipLongest (or any one of the three) then I can do so using simply base and
generic
types. Right now, with the enumeration, I have no choice but to support all
strategies and take a hit on MoreLINQ! Think about it. This is how
Enumerable.Contains works. When using Contains on variant typed as List<T> or
IList<T>, the LINQ extension method is not used! If one wants to force use of
the
LINQ's Contains implementation then one has to hop over AsEnumerable first. So
to
stay objective, we need to see which approach doesn't take us to far away from
how
LINQ operators should be designed, taking built-in ones as guidance.
Original comment by azizatif
on 17 Feb 2009 at 3:39
I don't want to resurrect any painful discussions here, but unless you envisage
more
than the three strategies already specified, I prefer the idea of the strategy
being
inferred by the method name.
- Zip
- PaddedZip
I don't think the truncated version is required because if you lazily evaluate
the
results of Zip and PaddedZip you can just do PaddedZip.Take or Zip.Take to get
the
shorter of the two.
Original comment by jeffma...@gmail.com
on 10 Mar 2009 at 10:05
@jeffmagic: You've got my vote on strategy being part of the method name. ;)
Original comment by azizatif
on 10 Mar 2009 at 11:05
I've taken sub-issue about distinct method names instead of a strategy
enumeration
and created issue #24 out of it.
Original comment by azizatif
on 7 Apr 2009 at 7:38
Great to see some progress on MoreLinq again. I have implemented a patch for
this and pushed it to my Bitbucket fork (it seems I do no longer have commit
access to this repo and I don't know how GoogleCode supports pull requests).
Code is here
https://bitbucket.org/johannesrudolph/morelinq/commits/b607e78b404bd86ff83a7f9ae
85c511397b06a7a
Please pull it in the trunk if it's acceptable.
Original comment by jojo.rudolph@googlemail.com
on 30 Jun 2013 at 9:37
@jojo.rudolph: While I can imagine rare cases (though I have yet to come across
it) where the last value could be handy, it's better to keep the simpler
overload where the default is given instead of the one accepting providers.
When case for providers builds stronger, it can always be added back in a
future version.
Defaults can be implemented as shown below but it could become verbose so
providing defaults can be a bit more palatable, especially in the midst of long
query.
var ns = MoreEnumerable.GenerateByIndex(n => (double?) n * 0.5);
var xy =
ns.Take(5).ZipLongest(ns.Take(10),
(x, y) => new
{ X = x ?? double.NaN,
Y = y ?? double.NaN });
foreach (var e in xy)
Console.WriteLine(e.ToString());
// prints:
// { X = 0, Y = 0 }
// { X = 0.5, Y = 0.5 }
// { X = 1, Y = 1 }
// { X = 1.5, Y = 1.5 }
// { X = 2, Y = 2 }
// { X = NaN, Y = 2.5 }
// { X = NaN, Y = 3 }
// { X = NaN, Y = 3.5 }
// { X = NaN, Y = 4 }
// { X = NaN, Y = 4.5 }
Original comment by azizatif
on 1 Jul 2013 at 9:53
Original issue reported on code.google.com by
CAmmerman@gmail.com
on 16 Feb 2009 at 3:05