patterns-group / code-patterns

Code Patterns: lots of reusable goodies for your .NET project
tribesoftware.org/code-patterns/
BSD 2-Clause "Simplified" License
9 stars 9 forks source link

new CollectionExtensions items: EmptyIfNull, Squash #152

Closed jbatte47 closed 10 years ago

jbatte47 commented 10 years ago

Example usage:

Func<IEnumerable<string>, IEnumerable<int>> extractor = set => set
  .EmptyIfNull().Squash().Select(item => item.Length);

string[] set1 = new[]{ "thing0", null, "thing25", null };
string[] set2 = null;
IEnumerable<int> lengths1 = extractor(set1); // 6,7
IEnumerable<int> lengths2 = extractor(set2); // empty
jlmorgan commented 10 years ago

If by empty, you mean empty set []?

jbatte47 commented 10 years ago

Indeed.

jbatte47 commented 10 years ago

Okay, so Compact is a simple Where clause right now. I never noticed before, but I added some debug print statements to my test so you can visually see that one set of items contains nulls and the other does not. Just at a glance I can see that Where(item => item != null) is changing the order of the output. All the items are accounted for, but they are showing up in a different order. I could add an optional attribute like bool preserveOrder = false to the method, use the faster Where clause if it's left alone, and use a dictionary of indexes from the original set to re-establish the proper order. Problems with this approach are:

  1. It's slower. I'm creating and managing one collection in order to operate on another.
  2. It could still fail, because the dictionary lookup for each item's original index will be dependent on the item type's equatable nature, which might not exist at all, leaving item comparison to fall back to Object.Equals. Blech.

@jlmorgan, @chrislwade: thoughts?

jlmorgan commented 10 years ago

For the sake of the test, you can sort A and B then compare?

jbatte47 commented 10 years ago

Sure, but that's not even in the scope of the test currently, since the concept of sort order isn't being addressed. The check goes: given a set with some nulls, once compacted, the result should have a count that checks out (it should be short by the number of null items) and none of the remaining items are null.

If we start to consider sort order, we would only want to change the sort order before comparison if the operation chose not to preserve sorting. If they chose to preserve it, the comparison would need to leave sort order alone before and after.

The question is, do we care right now that there is a side effect, since it's a small one? My tendency is to create a new ticket to address it at some point, but not worry too much.

jlmorgan commented 10 years ago

Personally, I think the ordering would be a caveat in the method documentation. They could just as easily call Sort after Compact.

chrislwade commented 10 years ago

Interestingly enough Ruby's compact method is implemented as a while loop (artifact of array design in Ruby). I would have thought it would be implemented using inject (C#'s Aggregate) so it could be lazily evaluated. The current implementation preserves order but is slower because it forces serial evaluation of the set.

jbatte47 commented 10 years ago

Yeah I think @jlmorgan's right; since this is actually a side effect of using the Where clause, I'd say that as long as the effect is advertised it doesn't matter. They can Sort the set afterward.

chrislwade commented 10 years ago

Agreed, using Where is the way to go here.