microsoft / FASTER

Fast persistent recoverable log and key-value store + cache, in C# and C++.
https://aka.ms/FASTER
MIT License
6.25k stars 562 forks source link

Log tail iteration does not end when log completed #872

Open simonhaines opened 10 months ago

simonhaines commented 10 months ago

The documentation seems to suggest (in the Log Completion section), that an iterator can check for log completion with the Ended property.

However when a log is completed, it seems that the iterator's WaitAsync method does not return during tail iteration in order for this property to be checked.

In this example code, the 'Scan complete' message is never sent to the console, and the scanning iterator waits forever after the log is completed.

using System.Diagnostics;
using FASTER.core;

using var settings = new FasterLogSettings("./Test", deleteDirOnDispose: true)
{
    AutoRefreshSafeTailAddress = true
};
using var log = new FasterLog(settings);

byte[] buffer = new byte[2048];
Random.Shared.NextBytes(buffer);

await Task.WhenAll(EnqueueThread(), ScanThread());
return;

async Task EnqueueThread()
{
    for (int count = 0; count < 5; ++count)
    {
        await log.EnqueueAsync(buffer);
        await Task.Delay(1000);
    }

    log.CompleteLog();
    Console.WriteLine("Enqueue complete");
}

async Task ScanThread()
{
    using var iterator = log.Scan(log.BeginAddress, long.MaxValue, scanUncommitted: true);
    while (true)
    {
        byte[] result;
        while (!iterator.GetNext(out result, out _, out _))
        {
            if (iterator.Ended)
            {
                Console.WriteLine("Scan complete");
                return;
            }

            await iterator.WaitAsync();
        }

        Console.WriteLine("Received buffer");
        Debug.Assert(result.SequenceEqual(buffer));
    }
}

Is the correct usage in this case that the iterator's WaitAsync method requires a CancellationToken passed in that is cancelled separately at the time the log is completed? The documentation and sample don't suggest this.

Another minor issue: the documentation also mentions the void RefreshUncommitted(bool spinWait = false) method, which is no longer part of the API, but instead seems to be configured in the FasterLogSettings (as in the code sample above).

badrishc commented 8 months ago

@tli2 - any idea about this?

tli2 commented 7 months ago

There appears to be some epoch shenanigans going on here, seemingly introduced by #871

Basically, the last commit had a double protect due to CompleteLog calling CommitInternal under protection. I looked a bit at other usages and CommitInternal is intended to be called without protection but this was not checked vigorously.

I can patch this quickly, but at some point we should probably discuss #871 as this is breaking something on the research branch too.