thesadrogue / TheSadRogue.Primitives

A collection of primitive data structures for working with a 2-dimensional grid.
MIT License
21 stars 6 forks source link

Optimizations #79

Closed Chris3606 closed 2 years ago

Chris3606 commented 2 years ago

This PR implements various performance optimizations.

Changes

Notes and Benchmarks

Grid View Fill and Clear

The ISettableGridView.Fill method currently in the integration library is convenient, but has a number of performance issues. First, it really just calls ApplyOverlay, which uses a Func to specify the value to set; this adds many function calls which slows the process down. Second, and more importantly, some grid view types like ArrayView<T> and BitArrayView can have their underlying data structures filled or cleared in a much more efficient way than simply iterating over all positions and assigning a value.

The current implementation of ISettableGridView.Fill uses a special case for BitArray to perform the operation faster for BitArrayView; but does nothing for ArrayView. Furthermore, special-cases for various types is not a very flexible system for controlling this. Therefore, this PR removes the extension method, and adds it as a method of ISettableGridView directly, so that implementers can provide their own implementations specific to their type.

A default interface implementation is provided in order to minimize impact to the existing API. Additionally, the "base" classes for implementing your own grid views (SettableGridViewBase, etc) also provide implementations, so if custom classes inherit from these, there will be no additional work. This allows BitArrayView and array view variations to provide a more efficient implementation.

Similarly, a Clear function has been added to the ISettableGridView interface, since clear can sometimes be implemented more efficiently than Fill.

The following performance tests outline the performance increases yielded as a result of this change on types which have the potential to implement fill/clear operations more efficiently:

Method Size Type Mean Error StdDev
Clear 10 ArrayView 7.518 ns 0.1018 ns 0.0952 ns
Fill 10 ArrayView 347.121 ns 4.7932 ns 4.4836 ns
OldFill 10 ArrayView 1,096.287 ns 9.4300 ns 8.3594 ns
ManualFill 10 ArrayView 517.721 ns 8.5281 ns 7.9771 ns
Clear 10 ArrayView2D 7.683 ns 0.1191 ns 0.1114 ns
Fill 10 ArrayView2D 1,422.502 ns 22.7307 ns 27.9153 ns
OldFill 10 ArrayView2D 1,612.172 ns 10.8591 ns 10.1576 ns
ManualFill 10 ArrayView2D 1,639.171 ns 13.0407 ns 10.1813 ns
Clear 10 BitArrayView 5.829 ns 0.0853 ns 0.0756 ns
Fill 10 BitArrayView 5.427 ns 0.0498 ns 0.0416 ns
OldFill 10 BitArrayView 5.517 ns 0.0955 ns 0.0847 ns
ManualFill 10 BitArrayView 603.479 ns 5.5270 ns 4.6153 ns
Clear 100 ArrayView 89.358 ns 0.3181 ns 0.2483 ns
Fill 100 ArrayView 33,291.069 ns 550.4695 ns 487.9768 ns
OldFill 100 ArrayView 101,582.738 ns 675.7490 ns 632.0961 ns
ManualFill 100 ArrayView 50,033.891 ns 390.2617 ns 345.9569 ns
Clear 100 ArrayView2D 87.728 ns 0.4218 ns 0.3522 ns
Fill 100 ArrayView2D 137,495.269 ns 478.7204 ns 399.7533 ns
OldFill 100 ArrayView2D 151,239.834 ns 456.4445 ns 356.3618 ns
ManualFill 100 ArrayView2D 162,698.369 ns 1,339.3460 ns 1,187.2954 ns
Clear 100 BitArrayView 23.143 ns 0.1230 ns 0.1151 ns
Fill 100 BitArrayView 18.798 ns 0.0725 ns 0.0679 ns
OldFill 100 BitArrayView 19.321 ns 0.0733 ns 0.0650 ns
ManualFill 100 BitArrayView 58,734.706 ns 410.5886 ns 384.0648 ns
Clear 200 ArrayView 612.813 ns 5.2026 ns 4.8665 ns
Fill 200 ArrayView 133,094.646 ns 1,018.6014 ns 902.9636 ns
OldFill 200 ArrayView 407,197.750 ns 2,306.1648 ns 2,044.3550 ns
ManualFill 200 ArrayView 199,615.820 ns 1,626.6210 ns 1,358.3022 ns
Clear 200 ArrayView2D 608.558 ns 12.1963 ns 12.5247 ns
Fill 200 ArrayView2D 552,939.521 ns 2,164.6341 ns 1,807.5675 ns
OldFill 200 ArrayView2D 612,126.322 ns 3,703.7907 ns 3,464.5281 ns
ManualFill 200 ArrayView2D 645,408.421 ns 3,129.0164 ns 2,612.8704 ns
Clear 200 BitArrayView 53.136 ns 0.2383 ns 0.1861 ns
Fill 200 BitArrayView 66.284 ns 0.2616 ns 0.2319 ns
OldFill 200 BitArrayView 66.551 ns 0.2705 ns 0.2259 ns
ManualFill 200 BitArrayView 233,529.564 ns 2,152.5208 ns 2,013.4693 ns

Clear and Fill represent this PR's implementation of those functions. OldFill represents the old extension method. ManualFill represents the result of using a set of for loops to just set the new value to each location (the default implementation of the interface method).

As evidence by the above tests, the current Fill and Clear implementations far out-perform the current implementations on these types.

Note that the OldFill function has the same special case for BitArrayView as the old implementation did; this is why the performance gains aren't as evident on this type.

Positions Enumerables

Currently, Rectangle and IGridView both provide a Positions function of some sort to iterate over all positions in them. This is implemented by simply using yield return to create an IEnumerable<Point>. However, since points are small value types, the overhead of enumerables/generators comes into play heavily here, particularly if the body of the loop is relatively fast.

This PR, therefore, implements a custom enumerator for enumerating over the positions in a rectangular area, and uses that enumerator to implement the Positions() functions for both Rectangle and IGridView. The new Positions function will work identically in a foreach loop, but it's a struct and therefore avoids much of the boxing and allocation that has to take place in the case of the IEnumerable implementation.

One result of this change is that you can no longer use System.Linq functions directly on the result of the Positions() function. For this use case, the enumerator which is now returned by that function has a ToEnumerator() function; so where before you might do myGridView.Positions().Where(p => p.X == 10), you can now do myGridView.Positions().ToEnumerable().Where(p => p.X == 10).

Also note that ToEnumerable() is implemented slightly differently (internally) than the previous Positions() function was; the new way is slightly better performance wise.

The following outlines the results of performance tests on this new implementation vs the old one:

Method Size Mean Error StdDev
ManualPositionsIterationNormal 10 209.19 ns 1.035 ns 0.808 ns
ManualPositionsIterationCacheWidthHeight 10 41.34 ns 0.204 ns 0.181 ns
ManualPositionsIterationCountNormal 10 768.99 ns 5.721 ns 5.071 ns
ManualPositionsIterationCountCacheCountAndWidth 10 364.11 ns 1.875 ns 1.565 ns
PositionsIteration 10 162.65 ns 0.772 ns 0.722 ns
OldEnumerablePositionsIterationNormal 10 777.74 ns 7.341 ns 6.866 ns
OldEnumerablePositionsIterationCacheWidthHeight 10 654.60 ns 3.705 ns 3.285 ns
PositionsToEnumerableIteration 10 638.97 ns 2.324 ns 1.815 ns
PositionsToEnumerableIterationShortcutMethod 10 731.70 ns 1.989 ns 1.763 ns
ManualPositionsIterationNormal 100 20,835.91 ns 157.029 ns 146.885 ns
ManualPositionsIterationCacheWidthHeight 100 4,705.53 ns 17.613 ns 15.613 ns
ManualPositionsIterationCountNormal 100 75,506.37 ns 412.176 ns 365.384 ns
ManualPositionsIterationCountCacheCountAndWidth 100 35,312.72 ns 170.125 ns 159.135 ns
PositionsIteration 100 15,450.15 ns 168.212 ns 140.464 ns
OldEnumerablePositionsIterationNormal 100 73,348.51 ns 464.810 ns 434.784 ns
OldEnumerablePositionsIterationCacheWidthHeight 100 63,980.28 ns 207.250 ns 161.807 ns
PositionsToEnumerableIteration 100 61,171.52 ns 175.065 ns 163.756 ns
PositionsToEnumerableIterationShortcutMethod 100 70,518.70 ns 154.078 ns 128.662 ns
ManualPositionsIterationNormal 200 74,758.23 ns 809.573 ns 757.275 ns
ManualPositionsIterationCacheWidthHeight 200 18,763.24 ns 29.602 ns 27.689 ns
ManualPositionsIterationCountNormal 200 301,921.13 ns 1,477.911 ns 1,310.130 ns
ManualPositionsIterationCountCacheCountAndWidth 200 140,903.36 ns 634.363 ns 529.722 ns
PositionsIteration 200 60,929.27 ns 173.272 ns 162.079 ns
OldEnumerablePositionsIterationNormal 200 292,128.97 ns 1,353.992 ns 1,130.645 ns
OldEnumerablePositionsIterationCacheWidthHeight 200 254,856.63 ns 680.767 ns 568.471 ns
PositionsToEnumerableIteration 200 244,469.45 ns 947.353 ns 886.155 ns
PositionsToEnumerableIterationShortcutMethod 200 282,435.41 ns 990.563 ns 827.165 ns

Brief descriptions of and observations about the tests above:

Notably, the PositionsIteration test, eg. the one that uses this PR's custom enumerator, is faster than nearly everything else; including ManualPositionsIterationNormal, which I would argue represents the most common way a user would implement the loops manually. You can, of course, achieve superior performance with manual for loops by caching the width/height; but this is to be expected (no enumerator in C# has 0 overhead). For reference, the new Positions() implementation is about 6-8x faster than the old one in these test cases.

Breaking Changes

This PR minimizes breaking changes, but there are a few limited cases where some minor breaking can occur.

First, if a user has a custom implementation of ISettableGridView, which does not inherit from one of the provided base classes, and they also call Fill on that view without it being casted to ISettableGridView, then they will have to add that function to their implementation. This should be extremely straightforward, and should only apply in a very limited subset of cases.

Second, any use case of Positions() functions which depend on the result being IEnumerable<Point> will require slight modifications to account for this update. These use cases would include anything operating on the Positions() result using System.Linq, and perhaps some custom interfaces. myObj.Positions() must now be written as myObj.Positions().ToEnumerable() in these cases. This change should be straightforward as well, however, and simple foreach usages (arguably the most common) obviously will not need this change.

Although I typically don't like introducing these types of breaking changes (although very minor ones) within a minor version, these changes do provide a significant performance boost as outlined above, so I believe they should be worth it here given the minimal impact to the API.