reactivemarbles / DynamicDataVNext

MIT License
14 stars 0 forks source link

Reworked ChangeSet factory methods, after more research #4

Closed JakenVeina closed 5 months ago

JakenVeina commented 6 months ago

Creating this as a PR just for the sake of getting everyone's eyes on it.

So, I did some research and realized that one of the things I was trying to do doesn't actually work.

I figured that the JIT would take a method like...

public void Foo<T, TItems>(TItems items)
    where TItems : IEnumerable<T>;

...and convert it to more-specific methods, depending on which different collection types are passed to it, E.G...

public void Foo<T>(IEnumerable<T> items);

public void Foo<T>(List<T> items);

public void Foo<T>(T[] items);

Thus, if the caller happens to have a List<T> the method they call gets benefit from List<T> optimizations in iterating and calling .Count. As it turns out, generic JIT only works this way when TItems is a value type. For reference types like IEnumerable<T>, List<T>, and T[], the JIT doesn't generate separate implementations, it just maps them all to the IEnumerable<T> implementation.

Additonally, I also need to retract a bit of advice I gave a few months ago, when I had run benchmarks for iterating over a List<T>, which I went ahead and re-created here...

List Enumeration Benchmark Results

Method ItemCount Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
ItemsAsEnumerable 10000 18,866.35 ns 373.998 ns 415.698 ns 1.00 0.00 - 40 B 1.00
ItemsAsReadOnlyListWithForEach 10000 19,061.90 ns 314.996 ns 337.042 ns 1.01 0.03 - 40 B 1.00
ItemsAsReadOnlyListWithFor 10000 7,651.48 ns 147.581 ns 362.019 ns 0.42 0.02 - - 0.00
ItemsAsReadOnlySpan 10000 3,564.88 ns 70.045 ns 111.099 ns 0.19 0.01 - - 0.00
ItemsAsList 10000 6,209.85 ns 93.557 ns 87.513 ns 0.33 0.01 - - 0.00

At the time, I had noticed that iterating over an IReadOnlyList<T> with just a plain for instead of a foreach was quite a bit faster. What I have since discovered is that this is only true if the IReadOnlyList<T> is actually a List<T> under the hood. If it's an Array<T> under the hood, using a for is actually worse.

Array Enumeration Benchmark Results

Method ItemCount Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
ItemsAsEnumerable 10000 14,417.04 ns 87.993 ns 73.478 ns 1.00 0.00 - 32 B 1.00
ItemsAsReadOnlyListWithForEach 10000 14,488.32 ns 174.939 ns 163.638 ns 1.01 0.01 - 32 B 1.00
ItemsAsReadOnlyListWithFor 10000 43,024.56 ns 245.363 ns 229.513 ns 2.98 0.02 - - 0.00
ItemsAsReadOnlySpan 10000 3,195.86 ns 30.429 ns 28.463 ns 0.22 0.00 - - 0.00
ItemsAsArray 10000 3,214.86 ns 31.132 ns 27.598 ns 0.22 0.00 - - 0.00

I also went ahead and ran one for Dictionary<TKey, TValue>

Dictionary Enumeration Benchmark Results

Method ItemCount Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
ItemsAsEnumerable 10000 26,079.1 ns 509.63 ns 714.43 ns 1.00 0.00 - 48 B 1.00
ItemsAsReadOnlyDictionaryOverItems 10000 26,097.9 ns 512.02 ns 569.11 ns 1.00 0.04 - 48 B 1.00
ItemsAsReadOnlyDictionaryOverKeys 10000 78,424.6 ns 1,109.81 ns 983.82 ns 2.98 0.10 - 40 B 0.83
ItemsAsReadOnlyDictionaryOverValues 10000 20,017.6 ns 360.47 ns 337.18 ns 0.76 0.03 - 40 B 0.83
ItemsAsDictionaryOverItems 10000 13,011.0 ns 246.49 ns 205.83 ns 0.49 0.02 - - 0.00
ItemsAsDictionaryOverKeys 10000 64,904.2 ns 1,296.30 ns 1,440.84 ns 2.48 0.09 - - 0.00
ItemsAsDictionaryOverValues 10000 13,194.4 ns 261.67 ns 340.25 ns 0.51 0.02 - - 0.00

Conclusion

So given all of the above, I think the general advice is as follows:

RolandPheasant commented 6 months ago

We COULD add overloads for List, Array, and Dictionary<TKey, TValue>, for performance, but I don't think we should. I don't think the performance gains are worth losing the API clarity that we are not mutating these collections, only reading from them. In the event that callers are concerned with performance at this level, they should really be using the ReadOnlySpan overloads.

I am curious to see how all of this will be consumed. The reason I introduced ChangeAwareCache and ChangeAwareList was to simplify changeset construction, which in turn made it much simpler to provide:

  1. Consistency for the various operators
  2. Optimization can be applied across the system
  3. It's super tedious to implement the operator and construct the change sets in the same code i.e the logic becomes very complicated and even worse to maintain.
  4. In the case of ChangeAwareList, I'd go as far as to say I would have withdrawn the list stuff without the optimizations it provides (that's mostly another topic, but making list construction optimal is vital if we are to support it in vNext).

Also, for nearly all constructed change sets in the real world we construct them internally and we can optimize as we see fit. Whereas for the occasional user based custom operator the performance impact of how the changeset is constructed would probably be minimal.

JakenVeina commented 6 months ago

Equivalents for ChangeAwareCache and ChangeAwareList were next on my todo list for this project, as well. It may be that these convenience methods don't get much use internally, so the question then remains whether they make sense to keep for public use. That question basically amounts to "how useful is it to be able to create ChangeSets without having to populate a collection first?" This seems like a pretty-darn-rare scenario, but I can think of one concrete example: the .NET FileSystemWatcher API, which exposes several different plain events for changes occurring on the file system, .Changed, .Created, .Deleted, .Renamed. Each of these could be translated directly into an equivalent ChangeSet without having to consume the extra memory of pumping them into a collection first.

I think it would be a mistake to prevent consumers from creating ChangeSets for themselves, if they want to, but that doesn't mean we need to provide convenience methods for it. We DO, I think, definitely need the construction methods upon the Change types, as those are part of guaranteeing correctness. Although, now that I think about it, maybe the same argument applies to ChangeSet types: as it is now, someone could create a ChangeSet of type Clear that contains changes other than just Removes. If we lock down the ChangeSet types the way Change types are, we could prevent that.

github-actions[bot] commented 5 months ago

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.