Color parsing and formatting improvements

dodexahedron commented 8 months ago

I don't even fully remember what brought me to it, but I noticed the regexes being used in Color parsing, and that raised performance concerns.

Color.TryParse does what it says, mostly, though it does miss some edge cases and allows out-of-bounds values, since it uses all signed integers.

I wrote up a quick XUnit test harness to show a simpler and more precise implementation of the rgb portion of it that is several times faster, and executes in constant time, via a combination of simple string manipulation and the C# 12 collection expression feature, which enables additional optimizations by the compiler over older ways of manipulating certain types of collections.

Here's the code. The relevant portion is what's between the stopwatch starts and stops.

First, it tries out the new method, asserting the expected values were properly parsed. Then it does it the old way with the regexes (but with one tweak, to enable it to handle rgba as well).

The new code would replace the two regex blocks at the bottom of the TryParse method.

public class ColorExperiments ( ITestOutputHelper testOut ) {

    [Theory]
    [MemberData ( nameof ( StringsForColorParser ) )]
    public void ColorParsingMethodComparison ( string input, int expectedValues )
    {
        Stopwatch sw = new Stopwatch ( );
        long newRegexTicks = 0;
        for ( int iteration = 0; iteration < 15000; iteration++ ) {
            sw.Start ( );
            if ( input.StartsWith ( "rgb(", StringComparison.InvariantCultureIgnoreCase ) && input.EndsWith ( ')' ) ) {
                string [] split = input.Split ( (char []) [',', '(', ')'], StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries );
                switch ( split ) {
                case ["rgb", var rString, var gString, var bString]:
                {
                    var r = byte.Parse ( rString );
                    var g = byte.Parse ( gString );
                    var b = byte.Parse ( bString );
                    sw.Stop ( );
                    Assert.Equal ( r, expectedValues );
                    Assert.Equal ( g, expectedValues );
                    Assert.Equal ( b, expectedValues );
                }
                    break;
                case ["rgb", var rString, var gString, var bString, var aString]:
                {
                    var r = byte.Parse ( rString );
                    var g = byte.Parse ( gString );
                    var b = byte.Parse ( bString );
                    var a = byte.Parse ( aString );
                    sw.Stop ( );
                    Assert.Equal ( r, expectedValues );
                    Assert.Equal ( g, expectedValues );
                    Assert.Equal ( b, expectedValues );
                    Assert.Equal ( a, expectedValues );
                }
                    break;
                case ["rgba", var rString, var gString, var bString]:
                {
                    var r = byte.Parse ( rString );
                    var g = byte.Parse ( gString );
                    var b = byte.Parse ( bString );
                    sw.Stop ( );
                    Assert.Equal ( r, expectedValues );
                    Assert.Equal ( g, expectedValues );
                    Assert.Equal ( b, expectedValues );
                }
                    break;
                case ["rgba", var rString, var gString, var bString, var aString]:
                {
                    var r = byte.Parse ( rString );
                    var g = byte.Parse ( gString );
                    var b = byte.Parse ( bString );
                    var a = byte.Parse ( aString );
                    sw.Stop ( );
                    Assert.Equal ( r, expectedValues );
                    Assert.Equal ( g, expectedValues );
                    Assert.Equal ( b, expectedValues );
                    Assert.Equal ( a, expectedValues );
                }
                    break;
                default:
                    sw.Stop ( );
                    Assert.Fail ( "Input string not in a valid rgb(#,#,#) format" );
                    break;
                }
                newRegexTicks += sw.ElapsedTicks;
            }
        }
        sw.Reset ( );

        long originalRegexTicks = 0;
        for ( int iteration = 0; iteration < 15000; iteration++ ) {
            sw.Start ( );
            var match = Regex.Match ( input, @"rgba?\((\d+),(\d+),(\d+),(\d+)\)" );
            if ( match.Success ) {
                var r = int.Parse ( match.Groups [ 1 ].Value );
                var g = int.Parse ( match.Groups [ 2 ].Value );
                var b = int.Parse ( match.Groups [ 3 ].Value );
                sw.Stop ( );
                originalRegexTicks += sw.ElapsedTicks;
                Assert.Equal ( r, expectedValues );
                Assert.Equal ( g, expectedValues );
                Assert.Equal ( b, expectedValues );
            }

            match = Regex.Match ( input, @"rgba?\((\d+),(\d+),(\d+),(\d+)\)" );
            if ( match.Success ) {
                var r = int.Parse ( match.Groups [ 1 ].Value );
                var g = int.Parse ( match.Groups [ 2 ].Value );
                var b = int.Parse ( match.Groups [ 3 ].Value );
                var a = int.Parse ( match.Groups [ 4 ].Value );
                sw.Stop ( );
                originalRegexTicks += sw.ElapsedTicks;
                Assert.Equal ( r, expectedValues );
                Assert.Equal ( g, expectedValues );
                Assert.Equal ( b, expectedValues );
                Assert.Equal ( a, expectedValues );
            }

            originalRegexTicks += sw.ElapsedTicks;
        }

        testOut.WriteLine ( $"Original: {originalRegexTicks.ToString ( )} | New: {newRegexTicks.ToString ( )}" );

    }

    /// <summary>
    /// Generates cases from 0 to 255 each, for rgb and rgba formats, each with 3 or 4 parameters, to handle buggy inputs.
    /// </summary>
    public static TheoryData<string, int> StringsForColorParser ( )
    {
        TheoryData<string, int> cases = new TheoryData<string, int> ( );
        for ( int i = 0; i < 256; i++ ) {
            cases.Add ( $"rgb({i:D},{i:D},{i:D})", i );
            cases.Add ( $"rgb({i:D},{i:D},{i:D},{i:D})", i );
            cases.Add ( $"rgba({i:D},{i:D},{i:D})", i );
            cases.Add ( $"rgba({i:D},{i:D},{i:D},{i:D})", i );
        }
        return cases;
    }
}

The new code also implicitly gains bounds restrictions by using bytes, which will limit the input range to [0,255] or the appropriate overflow exception will be thrown by the framework (to be caught in TryParse).

This code is between 2 and 12 times faster on the machine I ran it on than the regex code.

Other portions of the TryParse method can be similarly modified to consolidate things into a single switch statement, for the other formats (such as hex), which would be nice, and which would also implicitly gain some free potential performance from use of collection expressions and the reduced allocations made possible with those language features.

At minimum, some of the hex format code can be simplified and improved like so (this is just in one case, but the method for each is nearly identical):

var colorBytes = Convert.FromHexString ( text [ 1.. ] );
color = new Color (colorBytes[0], colorBytes[1], colorBytes[2]);

That gets it done in one allocation of a byte array, instead of all the individual substring calls, each of which is a new string allocation.

dodexahedron commented 8 months ago

Thanks for that.

Might be worth some benchmarking once we get finished with more stuff, for sure.

dodexahedron commented 8 months ago

Just a wild thought:

So, in theory, configuration of colors (or configuration in general) isn't something that is terribly likely to change between executions of an application.

With that basic assumption in mind, if we wanted to go for maximum startup performance, pre-computation and caching of things could be beneficial, for subsequent executions of the application. This is a common strategy employed by applications when there's an expensive operation that fits that criterion of only really needing to be done once, at startup. The most common example is graphics drivers and 3D games, which often will pre-compile shaders on first launch after a configuration or binary change, to front-load the work to the first launch, benefiting all future launches.

This would be pretty easy to do with colors. We could parse the configuration and cache the computed color values in a binary file that can just be loaded up on future executions. The method for knowing if the cache is stale could be something as simple and quick as a hash of the configuration file. If it matches the last known hash, use the cache. If not, parse and create a new cache.

BUT

It's also important to realize that this is already pretty darn fast. In some quick benchmarking I just did, Parse handles hex strings in RRGGBB and AARRGGBB format extremely quickly.

This is what I ran, which tests 255 different color strings in #AARRGGBB format 1000 times over, timing only the parse activity:

    [Fact]
    public void BenchmarkParse ( )
    {
        long totalTicks = 0;
        Stopwatch sw = new ( );
        List<Color> colors = new ( );
        for ( int i = 0; i < 1000; i++ ) {
            for ( byte b = 0; b < 255; b++ ) {
                sw.Start ( );
                Color c = Color.Parse ( $"#FF1234{b:x2}" );
                sw.Stop ( );
                totalTicks += sw.ElapsedTicks;
                sw.Reset ( );

                colors.Add ( c ); // Just doing this so it doesn't optimize it away. Not timed.
            }
        }
        TimeSpan s = TimeSpan.FromTicks ( totalTicks );
    }

Average time with a debug build was 8 ticks, which is 800 nanoseconds. And note that includes the time to format the string before input, which is unfair. Average time in a release build was 5 ticks (500 nanoseconds).

So, in the case of having 400 to operate on, at startup, we're looking at 320 microseconds in a debug build or 200 microseconds for release, on this machine's 5-year-old CPU. The disk IO activity to read the values will dominate that by an order of magnitude for HDD and will be on-par for SSD, so, in my opinion, further optimization of the parser for hex formats isn't worth the time.

Dealing in Spans really is a major boost to things, and the more I use them, the more I absolutely love what the dotnet team has been doing in the last couple versions. I had them sprinkled here and there in dotnet 7 projects, but really dove in head first with dotnet 8 and....Damn... I'm hooked. 😂

dodexahedron commented 8 months ago

Continuing along that train of thought...

Considering how quick it already is, computing a hash of a config file to determine cache freshness would, by itself, already take just as long, and then the load of the cache file from disk would incur an additional IO penalty, so my assumption is that even a binary cache will be slower every time.

I'm curious how this would perform on a Raspberry Pi 5 that Amazon just delivered today. I imagine it'll be slower, due to lower clock speed and slower memory, but I'd wager the performance difference will be proportional to those factors entirely, since it's a simple byte-matching algorithm.

dodexahedron commented 8 months ago

Here's an even more fair test, using 100000 random numbers generated by the crypto random number generator, and pre-formatting the string before parse, so now it's really only timing the parse:

    [Fact]
    public void BenchmarkParse ( )
    {
        long totalTicks = 0;
        Stopwatch sw = new ( );
        List<Color> colors = new ( );
        RandomNumberGenerator rng = RandomNumberGenerator.Create (  );
        Span<byte> intBytes = stackalloc byte [4];
        for ( int i = 0; i < 100000; i++ ) {
            rng.GetBytes ( intBytes );
            int colorInt = BinaryPrimitives.ReadInt32LittleEndian ( intBytes );
            string hexString = $"#{colorInt:X8}";
            sw.Start ( );
            Color c = Color.Parse ( hexString );
            sw.Stop ( );
            totalTicks += sw.ElapsedTicks;
            sw.Reset ( );

            colors.Add ( c ); // Just doing this so it doesn't optimize it away. Not timed.
        }
        TimeSpan s = TimeSpan.FromTicks ( totalTicks );
    }

Average time in release build was 4 ticks on the same machine as before.

So, 400 input strings, assuming all are random and unique (which is worst case, considering processor caches and whatnot), read at startup, would be 160 microseconds. :)

dodexahedron commented 8 months ago

All this is to say that yes, it was a good thought you had, and I shared similar curiosity.

But this data shows we probably shouldn't bother.

Just as a side note about the whole thing, before I even got started writing new code:

I tried implementations using slightly tighter regular expressions, as well as tweaks to the creation of the regexes, such as making them forward-only (which has a significant impact on that kind of regex) and pre-compiling them.

We are now running so much faster that we can complete one hundred thousand parses in a total time that is almost 100 times LESS than the time the best regex I could come up with was able to parse a single value (though the regex benchmarks speed up after the first use, which cuts the advantage by a factor of about 20 over 100k runs). All thanks to spans, since it's not like the code I wrote is terribly profound. 😅

dodexahedron commented 8 months ago

Oh hey....

Some of the conditional compilation stuff in Responder breaks Release builds, in the unit test project. I didn't track them down when trying these benchmarks, and just defined the DEBUG_IDISPOSABLE symbol to make it go away since I didn't care about that in the context of that benchmark.

But yeah - Definitely be sure code dependent on conditional stuff is handled everywhere.

tig commented 8 months ago

I love the use case of having a theme designer that dynamically updates the app as it's running.

So config is not just a startup scenario.

dodexahedron commented 8 months ago

Noted. And yeah, that's a neat case and certainly relevant for the code we're talking about. It's also relevant to TGD, though TGD is a special case pretty much all around haha.

But, as mentioned, we're talking less than a millisecond on middle-aged hardware on a laptop (which, as a matter of fact, was on battery at the time, as I was outside on my patio then, so it wasn't even performing at its full capability due to corporate power policy).

So I'm still pretty confident we can move on for now, that optimization efforts by me or anyone else are probably better spent elsewhere, and, if we want to revisit it for a future enhancement, there's no reason we can't come back to it later! :)

dodexahedron commented 8 months ago

One thing I wanted to mention as an example of just how simple this particular operation is:

The internet!

Any web page you visit typically has, at minimum, dozens, but most likely anywhere from hundreds to thousands of text declarations of 32-bit colors (even respecting alpha for rendering), but you don't feel that enough to care or notice. Sure, I'll grant it's non-zero, but the point is that so many other parts of the application, be it a web browser or something like Terminal.Gui occupy a much larger proportion of the time to process things than parsing of color values from a few bytes of text to a 4-byte integer that anything beyond the low hanging fruit is in the realm of micro-optimizations. 🤷‍♂️

The biggest win for this code, honestly, was just removing the regexes, since those were heavy. Even plain old string parsing would have been fine with the same basic strategy I used. The use of spans was just opportunistic since it's at least mostly similar to working with strings and therefore was a mostly-free win for performance that I was (well... am, since I'm not done with tests yet!) more than happy to contribute. 😊 And then the other enhancements vis-a-vis interface implementations were just because I'm a nerd haha.

And again I'm not saying there are no further possible gains - just that a couple hundred nanoseconds to parse text to a value that can be used from that point on has basically exhausted its opportunities for order-of-magnitude scale optimizations, and that similar or possibly less effort can be spent elsewhere for a better return on the time investment.

dodexahedron commented 8 months ago

Oops. Didn't drop a mention there to link it over here....

As of a few minutes ago, #3204 is now complete and ready for review/merge at your discretion.

dodexahedron commented 8 months ago

Closing this as it has been merged.

gui-cs / Terminal.Gui

Color parsing and formatting improvements #3170