zzzprojects / Eval-Expression.NET

C# Eval Expression | Evaluate, Compile, and Execute C# code and expression at runtime.
https://eval-expression.net/
Other
449 stars 86 forks source link

Caching Memory Leak #129

Closed stonstad closed 2 years ago

stonstad commented 2 years ago

With caching off memory is fixed at 200mb. With caching on I see unchecked memory growth to 20GB+.

My scenario is Eval.Execute(string expression, Dictionary<string, object> values);

To confirm I am not passing in a variety of expressions I create a Hash and add each expression to it. I can confirm 196 different expressions.

If the expression input is bound at 196 variations should memory remain fixed? Is the compilation process inspecting the second parameter which contains Dictionary<string, object> values?

Thank you!

stonstad commented 2 years ago

I can confirm 196 unique expressions but the MemoryCache entries size grows indefinitely (100,000+). There are no cache hits.

stonstad commented 2 years ago

The expressions I am using are all the same (196 total) -- and they have carriage returns, spaces, semi-colons, etc.

JonathanMagnan commented 2 years ago

Hello @stonstad ,

The cache key uses the type of each entry in the dictionary value.

One thing we miss on our side is doing an OrderBy on the type name. I will ask my developer to look and add this for the next release.

However, I'm not sure if that will explain your 20GB or not. I believe your issue might be something else but the missing OrderBy is definitely a problem.

Do you think you could create a runnable project with the issue? It doesn’t need to be your project, just a new solution with the minimum code to reproduce the issue. You can send it in private here: info@zzzprojects.com

Best Regards,

Jon

stonstad commented 2 years ago

Thank you for the response, Jon. I’ll work to get you a reproduction project. One question I have is how the cache key is created. Is the cache key specific to expression passed into the evaluate function, or does it include type information for secondary parameters? If type information is used, what happens if the type is dynamic?

edited

In one example, the passed in types are :

Eval.Execute(string, ConcurrentDictionary<string, object> values)
Eval.Execute(string, new { result = value }, concurrentDictionary<string, object>) 
Eval.Execute(string, new { result = { }}, concurrentDictionary<string, object>)

@JonathanMagnan Updated to include types used.

Thanks, Shaun

stonstad commented 2 years ago

In an attempt to address caching, I changed my use of anonymous type (e..g.) Eval.Execute(expression, new { Result = value }) I am trying a concrete class called ResultType …
to Eval.Execute(expression, new ResultType(value))

This throws an exception Oops! The following type ..ResultType is not yet supported for the UseSmartExecuteParmaterResolution option.

ResultType is

public struct ResultType { public object Result { get; set; } }

These are things I am trying to address the caching behavior. Do you know why a concrete type definition would fail with UseSmartExecuteParameterOption?

JonathanMagnan commented 2 years ago

One question I have is how the cache key is created.

The cache key is a combination of:

So all 3 expressions would give different keys because of the Parameter types part which is different for all of them

stonstad commented 2 years ago

UseSmartTypeResolution is enabled.

The expression is the same but the variables in the second parameter, the concurrent dictionary, obviously do change. Is this why caching is not working? How do we address this behavior?

JonathanMagnan commented 2 years ago

The expression is the same but the variables in the second parameter, the concurrent dictionary, obviously do change. Is this why caching is not working? How do we address this behavior?

If the type is the same, the same key should be used. The value doesn't matter to create the key, only the value type.

JonathanMagnan commented 2 years ago

Do you know why a concrete type definition would fail with UseSmartExecuteParameterOption?

Looking at the code, it seem to works with:

For other type, you will receive the error about that's not supported for the UseSmartExecuteParameterResolution option.

stonstad commented 2 years ago

OK, thanks Jon. I am good using an anonymous type. I was just trying different things in an attempt to get things to work. I am working to provide a reproduction project now.

Thanks, Shaun

stonstad commented 2 years ago

Hey Jon,

After inspecting the cache it becomes clear that Eval(string, ConcurrentDictionary<string, object>) creates a unique cache entry for each dictionary value contained within the second dictionary parameter. Since the dictionary may contain variable input each evaluation results in a cache miss:

The cache entries share the following parameters: Parameter1: The expression is: DocumentOwnerID != null Parameter2: The input is ConcurrentDictionary<string, object>

Cache Entry 1 Z.Expressions.EvalContext;DocumentOwnerID != null;System.Func2[[System.Collections.IDictionary, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]];System.Collections.Concurrent.ConcurrentDictionary2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]];System.Object;System.Int32;System.Object;System.String;System.Dynamic.ExpandoObject;System.Decimal;System.Decimal;System.String;System.DateTime;System.String;System.String;System.String;System.String;System.String;System.String;System.String;System.String;System.Int16;System.Int16;System.String;System.Int16;System.Int16;System.String;System.String;System.String;System.String;System.Object;System.String;System.String;System.String;System.String;System.Dynamic.ExpandoObject;System.Int16;System.String;System.String;System.Int32;System.Int32

Cache Entry 2 Z.Expressions.EvalContext;DocumentOwnerID != null;System.Func2[[System.Collections.IDictionary, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]];System.Collections.Concurrent.ConcurrentDictionary2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]];System.Object;System.Int32;System.Object;System.String;System.Object;System.String;System.String;System.String;System.String;System.String;System.String;System.Int32;System.Int32

Cache Entry 3 Z.Expressions.EvalContext;DocumentOwnerID != null;System.Func2[[System.Collections.IDictionary, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]];System.Collections.Concurrent.ConcurrentDictionary2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]];System.Object;System.Int32;System.Object;System.String;System.Object;System.String;System.String;System.String;System.String;System.Dynamic.ExpandoObject;System.Int16;System.String;System.String;System.Int32;System.Int32

I’m concerned that if this cannot be overcome we won’t be able to use Eval-Expression. The runtime performance with caching off is slow for us (i.e. 5 evaluations per second with expression compilation). The compiled versions of each evaluation is about 100x faster. Could you please let us know what our options are or if you have any thoughts?

Thanks, Shaun

stonstad commented 2 years ago

Associated reproduction code:

removed. See below.

stonstad commented 2 years ago
using Microsoft.Extensions.Caching.Memory;
using System.Collections.Concurrent;
using System.Diagnostics;
using Z.Expressions;

namespace MyApp
{
    class Program
    {
        private static MemoryCache _MemoryCache = new MemoryCache(new MemoryCacheOptions());

        public static void Main(string[] args)
        {
           EvalManager.Cache = _MemoryCache;
            EvalManager.DefaultContext.IncludeMemberFromAllParameters = true;
            EvalManager.DefaultContext.IsCaseSensitive = false;
            EvalManager.DefaultContext.DynamicGetMemberMissingValueFactory = (obj, propertyOrFieldName) => string.Empty;
            EvalManager.DefaultContext.UseSmartExecuteParameterResolution = true;
            EvalManager.DefaultContext.UseCache = true;

            Test();

            Console.WriteLine($"Cache Size: {_MemoryCache.Count}");
        }

        public static void Test()
        {
            Random random = new Random(0);
            ConcurrentDictionary<string, object> values = new ConcurrentDictionary<string, object>();

            // 2500 evaluations using one unique invocation of Eval.Execute("val1", ConcurrentDictionary<string, object>).

            for (int testIteration = 0; testIteration < 5; testIteration++)
            {
                Stopwatch stopwatch = new Stopwatch();
                stopwatch.Start();

                for (int evalationNumber = 0; evalationNumber < 500; evalationNumber++)
                {
                    int dictionarySize = random.Next(1, 1000);
                    values.Clear();
                    for (int j = 1; j <= dictionarySize; j++)
                        values["val" + j.ToString()] = j;

                    string expression = $"val1";
                    Eval.Execute(expression, new { val = evalationNumber }, values);
                }

                stopwatch.Stop();
                Console.WriteLine($"{stopwatch.ElapsedMilliseconds}ms, Cache Size: {_MemoryCache.Count}");
            }
        }
    }
}
stonstad commented 2 years ago

Dictionaries have random key/value storage order. Hopefully this isn't by design in Eval-Expression.

stonstad commented 2 years ago

As an attempted work-around I now do the following --

  1. Construct a sorted dictionary with values in alphanumerical order.
  2. Remove dictionary values (based on key name) which are not contained in the expression.

This addresses a few use cases but it is a poor solution because it requires full traversal of the expression for each variable contained in the dictionary. Building a language parser and caching which variables are contained in an expression would be a next optimization step, but I'm pretty sure there is already a language parser built into your product, right?

Separately, ExpandoObjects present a unique challenge because fields are also a problem for cache misses. They may exist in an arbitrary order because the underlying data structure of ExpandoObject is an unsorted dictionary with non-deterministic sort order. The presence of certain fields and absence of others also results in a cache miss, even if those values are not used in the Eval expression.

Please let me know if my understanding of what is happening here, re: cache misses, is on the right track. Ideally, I'd like the cache entry to be limited by expression and data type of the top level second parameter. However, I suspect optimizations around compilation require ahead of time knowledge of parameter types.

Perhaps a next best approach would be to exclude variables in the cache key which are not present or used in the expression.

Thanks, Shaun

JonathanMagnan commented 2 years ago

Hello @stonstad ,

Cause

Your expression uses a value directly in the dictionary without accessing the dictionary (use members automatically included).

So the only way to retrieve if the expression has already been compiled and re-use the method is to add all dictionary types in the cache key. We do not know in advance which key from the dictionary is used and the only step we could have known this is after the compiler did his job to create the full expression tree (just before we call the final Compile method), and at this time, that's probably too late.

Possible solutions

There is some way that we could fix this which could reduce the chance a new cache is generated such as simply checking if the key is contained in the expression. Might not also work such as some keys like a might be contained in the text without being used but could already offer a memory improvement even if not perfect since way less cache entry would be required.

We could also add an option to never add to the cache the dictionary key/type but then you might get some side effects such as if sometimes "val1" is a string and sometimes "val1" is an int, normally a different compiled version of the method would be required since the type is different.

Is there one of this two solutions (or a new one that you can propose) that you would like to try?

stonstad commented 2 years ago

"simply checking if the key is contained in the expression"

Yes, this could work. It would need to address the issue of hierarchical ExpandoObject/Dictionary children key creation. See below. Some of our workflows specify dictionaries with 100+ variables but only one or two are used in the expression.

"We could also add an option to never add to the cache the dictionary key/type but then you might get some side effects such as if sometimes "val1" is a string and sometimes "val1" is an int, "

In our scenario data exists in a hierarchy. Dictionary index 0, key "patient' might be a composite ExpandoObject with different types.

Collection Index 0 (Key="Patient") Value FirstName John LastName Smith DateOfBirth 1/1/2000

And the expression is Patient.FirstName + " " + Patient.LastName.

We have some data structures that are deeply arbitrary and nested. One of the issues with key creation is that, given an object with 20 fields, any one of them may not be present on a subsequent call to Evaluate. This results in consistent per-invocation cache misses.

peterchaloux commented 2 years ago

Hi @JonathanMagnan thanks for jumping on this yesterday morning. Could we get an update based on Shaun's last comment? Do you have any sense of an ETA?

JonathanMagnan commented 2 years ago

Hello @peterchaloux ,

A new version should be released next Tuesday.

In this version, we will add a new option TmpSmartCacheKey that you will be able to enable and will change how the cache behaves to try to solve this problem.

We called this new option "Tmp" for temporary as I believe we will still work with your company in coming weeks to improve it more and we are still not sure if the final version would require this option or it will be done by default.

stonstad commented 2 years ago

I think it would be helpful if there is a method to return the generated cache key. With this information we can learn when cache hits vs misses occur, and provide better diagnostic information to you.

JonathanMagnan commented 2 years ago

Hello @stonstad ,

In the latest version released yesterday, it's now possible to improve the cache key with the only key that is used in the expression with the following option: TmpSmartCacheKey = true. The current option has the prefix Tmp as the behavior might still risk changing or even be the default behavior in a near future.

It should dramatically reduce the number of caches in your cache.

We will look at your latest request if that's still necessary.

Let me know if at least with that option, everything is now better.

Best Regards,

Jon

stonstad commented 2 years ago

Thank you Jon. This is terrific. We will test it right away and report back.

Thanks, Shaun

stonstad commented 2 years ago

Jon, could you give us some additional detail? Specifically, what defines the new cache key? Does it include the types of variables? If a variable type is dynamic or an expando object, does the type information include this hierarchy, or can it work without a type hierarchy in the key? Thanks.

JonathanMagnan commented 2 years ago

Hello @stonstad ,

When using the TmpSmartCacheKey, the key is generated with:

Expando

Only properties in which the text is part of the expression to evaluate will be included. This is a pure string.Contains logic as we cannot know before the evaluation which properties will be really used or not.

By example, the following code:

EvalManager.DefaultContext.TmpSmartCacheKey = true;

dynamic expando = new ExpandoObject();
expando.X = 1;
expando.Y = "a";

var r2 = Eval.Execute("X + 2", expand);

Will create the following key:

Z.Expressions.EvalContext;X + 2;System.Func`2[[System.Object, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.Object, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]];System.Dynamic.ExpandoObject;System.Int32;DefaultNumberType:None;AliasStaticMembers:27;AliasExtensionMethods:64;AliasTypes:378;UseTypeBeforeDynamic:False;DisableDynamicResolution:False;UseSmartTypeResolution:True;BindingFlags:113;IncludeMemberFromAllParameters:False;AllowAddSubtractOperatorToCollection:False;SafeMode:False;UseCaretForExponent:False;ForceObjectAsDynamic:False;isLinq:False;AliasGlobalExpressionVariables:0;AliasGlobalMethodVariables:0;AliasNames:0;AliasMembers:64

The only part interesting related to expand is this part:

System.Dynamic.ExpandoObject;System.Int32

Expando & Hierarchy

If an expand have some hierarchy such as:

EvalManager.DefaultContext.TmpSmartCacheKey = true;

dynamic expando = new ExpandoObject();
expando.X = 1;
expando.Y = "a";
expando.Z = new ExpandoObject();
expando.Z.W = 5;

var r2 = Eval.Execute("X + 2 + Z.W", expand);

It will contains the type of X, Z and Z.W (again only if found in the expression like in this case).

Dictionary

Only keys in which the text is part of the expression to evaluate will be included. This is a pure string.Contains logic as we cannot know before the evaluation what is a variable and what's not.

By example, the following code:

EvalManager.DefaultContext.TmpSmartCacheKey = true;

var dict = new Dictionary<string, object>();
dict.Add("X", 1);
dict.Add("Y", "a");  

var r2 = Eval.Execute("X + 2", dict);

Will create the following key:

Z.Expressions.EvalContext;X + 2;System.Func`2[[System.Collections.IDictionary, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.Object, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]];System.Collections.Generic.Dictionary`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.Object, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]];System.Int32;DefaultNumberType:None;AliasStaticMembers:27;AliasExtensionMethods:64;AliasTypes:378;UseTypeBeforeDynamic:False;DisableDynamicResolution:False;UseSmartTypeResolution:True;BindingFlags:113;IncludeMemberFromAllParameters:False;AllowAddSubtractOperatorToCollection:False;SafeMode:False;UseCaretForExponent:False;ForceObjectAsDynamic:False;isLinq:False;AliasGlobalExpressionVariables:0;AliasGlobalMethodVariables:0;AliasNames:0;AliasMembers:64

The only part interesting related to expand is this part:

System.Collections.Generic.Dictionary`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.Object, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]];System.Int32
JonathanMagnan commented 2 years ago

Hello @stonstad ,

The v4.0.83 has been released.

I improved by last answer to explain more how it works.

Let me know if you need more information.

stonstad commented 2 years ago

Testing. Thank you!

JonathanMagnan commented 2 years ago

Hello @stonstad ,

Just to let you know that starting from the v4.0.85, the option TmpSmartCacheKey has been renamed by UseShortCacheKey. No additional logic has been added.