While investigating failures of Netherite in customer code (see here) I noticed a stack trace where OOM exceptions were thrown from FASTER at a time when shutting down, which is surprising because at that point all outstanding memory operations were just being cancelled - so I was not expecting any OOMs to be thrown.
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.IO.BinaryReader.ReadBytes(Int32 count)
at DurableTask.Netherite.Faster.FasterKV.Value.Serializer.Deserialize(Value& obj) in //src/DurableTask.Netherite/StorageLayer/Faster/FasterKV.cs:line 1594
at FASTER.core.GenericAllocator2.Deserialize(Byte* raw, Int64 ptr, Int64 untilptr, Record2[] src, Stream stream)
at FASTER.core.GenericAllocator`2.AsyncReadPageWithObjectsCallback[TContext](UInt32 errorCode, UInt32 numBytes, Object context)
at DurableTask.Netherite.Faster.AzureStorageDevice.CancelAllRequests() in //src/DurableTask.Netherite/StorageLayer/Faster/AzureBlobs/AzureStorageDevice.cs:line 246
at System.Threading.CancellationToken.<>c.b__12_0(Object obj)
at System.Threading.CancellationTokenSource.Invoke(Delegate d, Object state, CancellationTokenSource source)
at System.Threading.CancellationTokenSource.CallbackNode.<>c.b__9_0(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback()
at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
Taking a closer look at AsyncReadPageWithObjectsCallback, I can see that the errorCode is being basically ignored (other than for logging). I don't understand why it is o.k. for this code to read and deserialize the results even though this callback is a cancellation, i.e. the read was never completed?
private void AsyncReadPageWithObjectsCallback<TContext>(uint errorCode, uint numBytes, object context)
{
if (errorCode != 0)
{
logger?.LogError($"AsyncReadPageWithObjectsCallback error: {errorCode}");
}
PageAsyncReadResult<TContext> result = (PageAsyncReadResult<TContext>)context;
Record<Key, Value>[] src;
// We are reading into a frame
if (result.frame != null)
{
var frame = (GenericFrame<Key, Value>)result.frame;
src = frame.GetPage(result.page % frame.frameSize);
}
else
src = values[result.page % BufferSize];
// Deserialize all objects until untilptr
if (result.resumePtr < result.untilPtr)
{
MemoryStream ms = new(result.freeBuffer2.buffer);
ms.Seek(result.freeBuffer2.offset, SeekOrigin.Begin);
Deserialize(result.freeBuffer1.GetValidPointer(), result.resumePtr, result.untilPtr, src, ms);
ms.Dispose();
result.freeBuffer2.Return();
result.freeBuffer2 = null;
result.resumePtr = result.untilPtr;
}
// If we have processed entire page, return
if (result.untilPtr >= result.maxPtr)
{
result.Free();
// Call the "real" page read callback
result.callback(errorCode, numBytes, context);
return;
}
While investigating failures of Netherite in customer code (see here) I noticed a stack trace where OOM exceptions were thrown from FASTER at a time when shutting down, which is surprising because at that point all outstanding memory operations were just being cancelled - so I was not expecting any OOMs to be thrown.
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.IO.BinaryReader.ReadBytes(Int32 count) at DurableTask.Netherite.Faster.FasterKV.Value.Serializer.Deserialize(Value& obj) in //src/DurableTask.Netherite/StorageLayer/Faster/FasterKV.cs:line 1594 at FASTER.core.GenericAllocatorb__12_0(Object obj)
at System.Threading.CancellationTokenSource.Invoke(Delegate d, Object state, CancellationTokenSource source)
at System.Threading.CancellationTokenSource.CallbackNode.<>c.b__9_0(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback()
at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
2.Deserialize(Byte* raw, Int64 ptr, Int64 untilptr, Record
2[] src, Stream stream) at FASTER.core.GenericAllocator`2.AsyncReadPageWithObjectsCallback[TContext](UInt32 errorCode, UInt32 numBytes, Object context) at DurableTask.Netherite.Faster.AzureStorageDevice.CancelAllRequests() in //src/DurableTask.Netherite/StorageLayer/Faster/AzureBlobs/AzureStorageDevice.cs:line 246 at System.Threading.CancellationToken.<>c.Taking a closer look at
AsyncReadPageWithObjectsCallback
, I can see that the errorCode is being basically ignored (other than for logging). I don't understand why it is o.k. for this code to read and deserialize the results even though this callback is a cancellation, i.e. the read was never completed?