sklose / NCalc2

expression evaluator for .NET with built-in compiler
MIT License
170 stars 57 forks source link

Remove excessive check for case when case is ignored #84

Closed Bykiev closed 9 months ago

Bykiev commented 9 months ago

Remove excessive check for case when case is ignored

Bykiev commented 9 months ago

@david-brink-talogy, comparing with ToUpper()/ToLower() is not the best choice as it'll be not working with some cultures. For ex.:

Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
const string input = "interesting";

bool comparison = input.ToUpper() == "INTERESTING";

Console.WriteLine("These things are equal: " + comparison);

https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#stringtoupper-and-stringtolower

Bykiev commented 9 months ago

@david-brink-talogy, I did some more research and found that in switch block comparison occurs with String.Equals(string, string) which is compiled into: String.Equals(string1, string2, StringComparison.Ordinal);,

so we can convert the function name to upper case with ToUpperInvariant() and it could be faster. I'll prepare some tests with BenchmarkDotNet later to check it .

Bykiev commented 9 months ago

I did the benchmarking, here is results:

public class TestComparision
{
    string functionName = "in";

    [Benchmark]
    public void TestWithToUpperInvariant()
    {
        string upperFuncName = functionName.ToUpperInvariant();

        switch (upperFuncName)
        {
            case "ABS": { break; }
            case "ACOS": { break; }
            case "ASIN": { break; }
            case "ATAN": { break; }
            case "CEILING": { break; }
            case "COS": { break; }
            case "EXP": { break; }
            case "FLOOR": { break; }
            case "IEEEREMAINDER": { break; }
            case "LOG": { break; }
            case "LOG10": { break; }
            case "POW": { break; }
            case "ROUND": { break; }
            case "SIGN": { break; }
            case "SIN": { break; }
            case "SQRT": { break; }
            case "TAN": { break; }
            case "TRUNCATE": { break; }
            case "MAX": { break; }
            case "MIN": { break; }
            case "IF": { break; }
            case "IN": { break; }
        }
    }

    [Benchmark(Baseline =true)]
    public void TestWithOrdinalIgnoreCase()
    {
        switch (functionName)
        {
            case string n when n.Equals("abs", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("acos", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("asin", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("atan", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("ceiling", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("cos", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("exp", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("floor", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("ieeeremainder", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("log", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("log10", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("pow", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("round", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("sign", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("sin", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("sqrt", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("tan", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("truncate", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("max", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("min", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("if", StringComparison.OrdinalIgnoreCase): { break; }
            case string n when n.Equals("in", StringComparison.OrdinalIgnoreCase): { break; }
        }
    }
}

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19044.2965/21H2/November2021Update) Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores .NET SDK 8.0.101 [Host] : .NET 6.0.26 (6.0.2623.60508), X64 RyuJIT AVX2 DefaultJob : .NET 6.0.26 (6.0.2623.60508), X64 RyuJIT AVX2

Method Mean Error StdDev Median Ratio RatioSD
TestWithToUpperInvariant 22.96 ns 0.491 ns 1.408 ns 22.46 ns 0.44 0.03
TestWithOrdinalIgnoreCase 52.83 ns 1.014 ns 1.318 ns 52.60 ns 1.00 0.00
Bykiev commented 9 months ago

I still can't understand why does ToUpperInvariant() + Ordinal is faster than OrdinalIgnoreCase. Any ideas?

Upd. Well, it's because of with OrdinalIgnoreCase the case is checked on each iteration and with Ordinal comparer only the first time, with abs (which is first) the results are different:

Method Mean Error StdDev Ratio RatioSD
TestWithToUpperInvariant 21.927 ns 0.1647 ns 0.1286 ns 8.08 0.24
TestWithOrdinalIgnoreCase 2.702 ns 0.0580 ns 0.0754 ns 1.00 0.00

Don't know exactly does it makes any sense to change the existing logic? We can also change the order of switch case to make most common functions on top, but what is the most common is differs...

So, with ToUpperInvariant() the results will be similar and the position of function in switch/case does low impact on overall. With OrdinalIgnoreCase the results will be vary in the begin and in the end, because case checks will be made on each iteration. @sklose, what do you think about it?

I believe, we should change the order of case blocks to alphabetical, for some reason if, in, min, max has wrong order

david-brink-talogy commented 9 months ago

comparing with ToUpper()/ToLower() is not the best choice as it'll be not working with some cultures. For ex.:

You're correct, good callout.

Well, it's because of with OrdinalIgnoreCase the case is checked on each iteration and with Ordinal comparer only the first time, with abs (which is first) the results are different:

Yes, this is what I meant when I wrote "especially for the later functions.". The ncalc functions checked last have to be pattern matched against all the prior patterns first. A switch on static values should perform better since the compiler can more efficiently branch to the correct case. The normalization to invariant uppercase also happens just once, making each string comparison marginally faster.

Don't know exactly does it makes any sense to change the existing logic?

Based on your tests function name resolution is about twice as fast on average for "in". But obviously, we're talking about very small amounts of time here. Would you be willing to benchmark a few function calls as a single test (maybe ceiling, round, and if) to simulate a single expression with multiple functions that are spread out across the switch.

for some reason if, in, min, max has wrong order

These were probably just added last and were appended to the existing switch by the original developer.

Bykiev commented 9 months ago

@david-brink-talogy, here is a test results with ceiling, round, and if functions:

Method Mean Error StdDev Ratio RatioSD
TestWithToUpperInvariant 65.46 ns 0.973 ns 1.265 ns 0.75 0.02
TestWithOrdinalIgnoreCase 87.52 ns 1.337 ns 1.185 ns 1.00 0.00
david-brink-talogy commented 9 months ago

@Bykiev , Thanks. If you feel comfortable making the change it does seem like there is a benefit.

Bykiev commented 9 months ago

@david-brink-talogy, thank you for discussion, I'll create a PR with these changes, also I'll change the order of function names in case block to follow alphabetical order. Thank you!

sklose commented 9 months ago

thanks for the research and benchmarking! Could it make sense to turn this into a dictionary lookup? That way the order of the functions wouldn't matter for performance purposes.

Bykiev commented 9 months ago

thanks for the research and benchmarking! Could it make sense to turn this into a dictionary lookup? That way the order of the functions wouldn't matter for performance purposes.

Sorry, didn't get the idea, can you please share you thoughts?

david-brink-talogy commented 9 months ago

thanks for the research and benchmarking! Could it make sense to turn this into a dictionary lookup? That way the order of the functions wouldn't matter for performance purposes.

@sklose, I think a switch on static values should be very performant. The c# compiler should be able to create a jump table to find the appropriate match rather than needing to test every possible case in sequence.

@Bykiev, I think the dictionary implementation would have to have a callback (Func) defined for the implementation of each function. I did need to make CheckCase static to do this.

        private static readonly Dictionary<string, Func<EvaluationVisitor, Function, object>> StandardFunctions = new Dictionary<string, Func<EvaluationVisitor, Function, object>>() {
            { "ABS", (visitor, function) => {
                    CheckCase("Abs", function.Identifier.Name);

                    if (function.Expressions.Length != 1)
                        throw new ArgumentException("Abs() takes exactly 1 argument");

                    bool useDouble = (visitor._options & EvaluateOptions.UseDoubleForAbsFunction) == EvaluateOptions.UseDoubleForAbsFunction;
                    if (useDouble)
                    {
                        return Math.Abs(Convert.ToDouble(
                                                  visitor.Evaluate(function.Expressions[0]))
                        );
                    }
                    else
                    {
                        return Math.Abs(Convert.ToDecimal(
                                                  visitor.Evaluate(function.Expressions[0]))
                        );
                    }                
                }
            },
            { "ACOS", (visitor, function) => {
                    CheckCase("Acos", function.Identifier.Name);

                    if (function.Expressions.Length != 1)
                        throw new ArgumentException("Acos() takes exactly 1 argument");

                    return Math.Acos(Convert.ToDouble(visitor.Evaluate(function.Expressions[0])));
                }
            },
            // continue...
        };

        // in place of the current switch

            if (!StandardFunctions.TryGetValue(function.Identifier.Name.ToUpper(), out var functionImplementation))
            {
                throw new ArgumentException("Function not found", function.Identifier.Name);
            }

            Result = functionImplementation(this, function);        
Bykiev commented 9 months ago

@david-brink-talogy, I did some benchmarks on my laptop and result is twice worse with static dictionary. Test case:

public class TestComparision
{
    string[] functions = new string[] { "ceiling", "round", "if", "round", "if", "ceiling" };

    private static readonly Dictionary<string, Func<EvaluationVisitor, Function, object?>> StandardFunctions = new() {
            { "ABS", (visitor, function) => {  return null; } },
            { "ACOS", (visitor, function) => { return null; } },
            { "ASIN", (visitor, function) => { return null; } },
            { "ATAN", (visitor, function) => { return null; } },
            { "CEILING", (visitor, function) => { return null; } },
            { "COS", (visitor, function) => { return null; } },
            { "EXP", (visitor, function) => { return null; } },
            { "FLOOR", (visitor, function) => { return null; } },
            { "IEEEREMAINDER", (visitor, function) => { return null; } },
            { "LOG", (visitor, function) => { return null; } },
            { "LOG10", (visitor, function) => { return null; } },
            { "POW", (visitor, function) => { return null; } },
            { "ROUND", (visitor, function) => { return null; } },
            { "SIGN", (visitor, function) => { return null; } },
            { "SIN", (visitor, function) => { return null; } },
            { "SQRT", (visitor, function) => { return null; } },
            { "TAN", (visitor, function) => {  return null; } },
            { "TRUNCATE", (visitor, function) => { return null; } },
            { "MAX", (visitor, function) => { return null; } },
            { "MIN", (visitor, function) => { return null; } },
            { "IF", (visitor, function) => { return null; } },
            { "IN", (visitor, function) => { return null; } }
        };

    [Benchmark]
    public void TestWithToUpperInvariant()
    {
        foreach (var functionName in functions)
        {
            string upperFuncName = functionName.ToUpperInvariant();

            switch (upperFuncName)
            {
                case "ABS": { break; }
                case "ACOS": { break; }
                case "ASIN": { break; }
                case "ATAN": { break; }
                case "CEILING": { break; }
                case "COS": { break; }
                case "EXP": { break; }
                case "FLOOR": { break; }
                case "IEEEREMAINDER": { break; }
                case "LOG": { break; }
                case "LOG10": { break; }
                case "POW": { break; }
                case "ROUND": { break; }
                case "SIGN": { break; }
                case "SIN": { break; }
                case "SQRT": { break; }
                case "TAN": { break; }
                case "TRUNCATE": { break; }
                case "MAX": { break; }
                case "MIN": { break; }
                case "IF": { break; }
                case "IN": { break; }
            }
        }
    }

    [Benchmark(Baseline = true)]
    public void TestWithStaticDic()
    {
        foreach (var functionName in functions)
        {
            if (!StandardFunctions.TryGetValue(functionName.ToUpperInvariant(), out var functionImplementation))
            {
                throw new ArgumentException("Function not found", functionName);
            }
        }
    }
}

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.3930/22H2/2022Update) Intel Core i3-6100U CPU 2.30GHz (Skylake), 1 CPU, 4 logical and 2 physical cores .NET SDK 8.0.200 [Host] : .NET 6.0.27 (6.0.2724.6912), X64 RyuJIT AVX2 DefaultJob : .NET 6.0.27 (6.0.2724.6912), X64 RyuJIT AVX2

Method Mean Error StdDev Ratio
TestWithToUpperInvariant 212.5 ns 3.20 ns 2.84 ns 0.64
TestWithStaticDic 331.1 ns 2.17 ns 1.92 ns 1.00
david-brink-talogy commented 9 months ago

@Bykiev , I'm not entirely surprised.

I used the invariant switch you posted and benchmarked Abs, Log, In, and non-existent function "Foo". I don't think it matters what order the cases are in.

image

Bykiev commented 9 months ago

@david-brink-talogy, it's just a ns, the order doesn't matter with invariant switch, the most time occupies converting function name to upper. Let's merge this PR

david-brink-talogy commented 9 months ago

Let's merge this PR

Agreed, it looks good.