Closed avonwyss closed 1 year ago
not sure, we have never seen this issue on our side. if you can provide a stand alone repro we will be able to diagnose it further, thanks.
i think 2.1.0 should also be upgraded to as it does contain some important fixes.
Thank you for the reply. As I noted the problem occurred only under load, e.g. with a high degree of concurrency read and write while the scan was started. We also noticed memory runaway (with an OoO exception after more than 60GB were used) and really poor performance.
Upon further inspection it seemed that TakeHybridLogCheckpointAsync()
was performing poorly when invoked during high load and that the scan may have been failing if the TakeHybridLogCheckpointAsync()
was still running in the background. At least the issue seems to be gone now that we are only taking the checkpoint when SystemState.Phase == Phase.REST
, e.g. avoiding the checkpoint under load.
@badrishc We have now upgraded to 2.2.0 now but still face issues related to scanning and snapshotting.
For one, when upsert is done as per the code above, the memory grows since there seems to be a pin to memory which is not released, thus the GC cannot work:
All of these are sparse (only zeros) byte[]
arrays of 32MB in size.
However, when I remove the upsert call and do the snapshot, odd things start happening. At some point in time I get the following exception when accessing the data:
System.NullReferenceException: Object reference not set to an instance of an object.
at FASTER.core.ClientSession`6.InternalFasterSession.TryLockEphemeralExclusive(RecordInfo& recordInfo)
at FASTER.core.FasterKV`2.InternalUpsert[Input,Output,Context,FasterSession](Key& key, Input& input, Value& value, Output& output, Context& userContext, PendingContext`3& pendingContext, FasterSession fasterSession, FasterExecutionContext`3 sessionCtx, Int64 lsn)
at FASTER.core.FasterKV`2.UpsertAsync[Input,Output,Context](IFasterSession`5 fasterSession, FasterExecutionContext`3 currentCtx, PendingContext`3& pcontext, Key& key, Input& input, Value& value, Context userContext, Int64 serialNo, CancellationToken token)
at FASTER.core.ClientSession`6.UpsertAsync(Key& key, Input& input, Value& desiredValue, Context userContext, Int64 serialNo, CancellationToken token)
at FASTER.core.ClientSession`6.UpsertAsync(Key& key, Value& desiredValue, Context userContext, Int64 serialNo, CancellationToken token)
at Seram.Web.Cache.FasterCache`2.<UpsertAsync>d__25.MoveNext() in C:\Hg\Seram.Web\Seram.Web.Shared\Cache\FasterCache.cs:line 133
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
at Seram.Area.Indicators.Values.ValueContext.<>c__DisplayClass96_0.<AddValueToCache>g__UpsertContinuation|0() in C:\Hg\Seram.Web\Seram.Area.Indicators\Values\ValueContext.cs:line 437
From there on, the FASTER log seems to be corrupted. Even just querying the count causes an exception:
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at FASTER.core.FasterKV`2.GetEntryCount() in D:\a\1\s\cs\src\core\Index\FASTER\FASTER.cs:line 866
...and the snapshotted data cannot be loaded from disk (also crashes FASTER).
...and we still get AVs:
System.AccessViolationException
at FASTER.core.SpanByteVarLenStruct.GetLength(FASTER.core.SpanByte ByRef)
at FASTER.core.VariableLengthBlittableAllocator`2[[FASTER.core.SpanByte, FASTER.core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=eb19722ac09e9af2],[FASTER.core.SpanByte, FASTER.core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=eb19722ac09e9af2]].GetRecordSize(Int64)
at FASTER.core.VariableLengthBlittableScanIterator`2[[FASTER.core.SpanByte, FASTER.core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=eb19722ac09e9af2],[FASTER.core.SpanByte, FASTER.core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=eb19722ac09e9af2]].GetNext(FASTER.core.RecordInfo ByRef)
at Seram.Web.Cache.FasterCache`2[[Seram.Area.Indicators.Values.ValueCacheKey, Seram.Area.Indicators, Version=2.2804.8409.20143, Culture=neutral, PublicKeyToken=null],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Compact(Seram.Web.Cache.KeyPredicate`1<Seram.Area.Indicators.Values.ValueCacheKey>, Boolean)
at Seram.Area.Indicators.CacheMaintenance+<CacheCleanupInternal>d__2.MoveNext()
Using FASTER 2.0.22 (we cannot update to 2.1.0 right now due to NuGet dependencies).
We have a custom compaction method which scans the log as follows:
This is running concurrently to other operations on the log in a ASP.NET MVC application.
On the server under load we get sporadic (but pretty frequent)
AccessViolationException
in the iterator'sGetNext
:Are we somehow causing this by inadequate memory-related code or is this a problem of FASTER?