pieterderycke / Jace

Jace.NET is a calculation engine for the .NET platform.
MIT License
436 stars 94 forks source link

Possible race condition in Jace.Util.MemoryCache<TKey, TValue>.EnsureCacheStorageAvailable #4

Closed rummelsworth closed 10 years ago

rummelsworth commented 10 years ago

I'm using commit 2c0033aac56dba919a97db8d2a2702b95c9baf1e.

Description

Probably less than 10% of the time that I run Jace.Benchmarks (in a Debug configuration), in the parallel part during either the interpreted or compiled subpart, there is an unhandled NullReferenceException from the orderby lambda within the keysToDelete query composition in the Jace.Util.MemoryCache<TKey, TValue>.EnsureCacheStorageAvailable method.

When this happens, the current p (a KeyValuePair<TKey, CacheItem> structure) has both Key and Value equal to null, like a default zeroed structure had been retrieved rather than any of the previously added KeyValuePair structures (which seem to always be added to the ConcurrentDictionary with non-null members). The call to LastAccessed seems to be the thrower.

To reproduce

To try to reliably see this error, you can make the following source modification: In Jace.Benchmarks.Program, starting around line 50, adjust the number of random generated functions to 10,000 and the number of executions per function to 1,000. After I do this, I can reliably reproduce the above-mentioned error in every run I launch.

Possible solution

On line 157 of MemoryCache.cs, replacing the query source dictionary with dictionary.ToArray() (which uses ConcurrentDictionary.ToArray, not Enumerable.ToArray) seems to totally resolve the problem by thread-safely taking a snapshot of dictionary before the query is composed and executed.

Possible explanation

Based on the relatively little I've reviewed at the .NET 4.5.1 reference source for Enumerable's query operators, the race condition seems like it might be due to (a) the query's source enumerable being dictionary itself rather than a snapshot and (b) one thread's dictionary.TryRemove (MemoryCache.cs:167) occurring between some enumerator object's MoveNext and Current calls on another thread (the enumerator being some one of the several that exist in the operator call-chain upon query execution) when executing ToList on the composed query (MemoryCache.cs:159).

I'm not (or not yet) super-familiar with LINQ or CLR threading, though, so this explanation might be wrong or at least incomplete.

rummelsworth commented 10 years ago

For what it's worth: I'm not sure whether Jace.NET has been tested much using Mono, but I tried the solution I offered above under Mono 3.2.8 and it did not resolve the seeming problem with the race condition. It still occurs under Mono 3.2.8. So perhaps the proposed solution above is invalid? (or it only works under .NET for some reason?)

pieterderycke commented 10 years ago

I will further investigate it. If I would implement fixes, are you willing to test them on mono for me?

pieterderycke commented 10 years ago

@wrummler could you test the fix on Mono?

rummelsworth commented 10 years ago

The good news is that I no longer can reproduce this error with the latest dev commit (7318fc6581dd86fdb90ce01060e40d2ac34eaf37) on Lubuntu 14.04 with Mono 3.2.8. The bad news is that I'm not totally sure my tests are "okay" because I'm still getting the assertion error in #8 (for which I still need to make a ticket at the Mono project... soon).