Closed rummelsworth closed 10 years ago
For what it's worth: I'm not sure whether Jace.NET has been tested much using Mono, but I tried the solution I offered above under Mono 3.2.8 and it did not resolve the seeming problem with the race condition. It still occurs under Mono 3.2.8. So perhaps the proposed solution above is invalid? (or it only works under .NET for some reason?)
I will further investigate it. If I would implement fixes, are you willing to test them on mono for me?
@wrummler could you test the fix on Mono?
The good news is that I no longer can reproduce this error with the latest dev commit (7318fc6581dd86fdb90ce01060e40d2ac34eaf37) on Lubuntu 14.04 with Mono 3.2.8. The bad news is that I'm not totally sure my tests are "okay" because I'm still getting the assertion error in #8 (for which I still need to make a ticket at the Mono project... soon).
I'm using commit 2c0033aac56dba919a97db8d2a2702b95c9baf1e.
Description
Probably less than 10% of the time that I run Jace.Benchmarks (in a Debug configuration), in the parallel part during either the interpreted or compiled subpart, there is an unhandled
NullReferenceException
from theorderby
lambda within thekeysToDelete
query composition in theJace.Util.MemoryCache<TKey, TValue>.EnsureCacheStorageAvailable
method.When this happens, the current
p
(aKeyValuePair<TKey, CacheItem>
structure) has bothKey
andValue
equal tonull
, like a default zeroed structure had been retrieved rather than any of the previously addedKeyValuePair
structures (which seem to always be added to theConcurrentDictionary
with non-null members). The call toLastAccessed
seems to be the thrower.To reproduce
To try to reliably see this error, you can make the following source modification: In Jace.Benchmarks.Program, starting around line 50, adjust the number of random generated functions to 10,000 and the number of executions per function to 1,000. After I do this, I can reliably reproduce the above-mentioned error in every run I launch.
Possible solution
On line 157 of MemoryCache.cs, replacing the query source
dictionary
withdictionary.ToArray()
(which usesConcurrentDictionary.ToArray
, notEnumerable.ToArray
) seems to totally resolve the problem by thread-safely taking a snapshot ofdictionary
before the query is composed and executed.Possible explanation
Based on the relatively little I've reviewed at the .NET 4.5.1 reference source for Enumerable's query operators, the race condition seems like it might be due to (a) the query's source enumerable being
dictionary
itself rather than a snapshot and (b) one thread'sdictionary.TryRemove
(MemoryCache.cs:167) occurring between some enumerator object'sMoveNext
andCurrent
calls on another thread (the enumerator being some one of the several that exist in the operator call-chain upon query execution) when executingToList
on the composed query (MemoryCache.cs:159).I'm not (or not yet) super-familiar with LINQ or CLR threading, though, so this explanation might be wrong or at least incomplete.