microsoft / FASTER

Fast persistent recoverable log and key-value store + cache, in C# and C++.
https://aka.ms/FASTER
MIT License
6.32k stars 567 forks source link

Issues around IScanIteratorFunctions #829

Closed Tornhoof closed 1 year ago

Tornhoof commented 1 year ago

PR #817 added the functionality of PushIterators. I apologize in advance for the lengthy text, but I think I ran into a blocker, at least from my understanding, which prevents me from using the current version of FASTER.

I tried to migrate my Wrapper around Faster Cache, which used the pull iterator to iterate through the key/value pairs. The wrapper type has a custom enumerator struct which basically looks like the example below, this is a typical .NET style Enumerator implementation, nothing special. Now with the current build of Faster, the exception Cannot run pull Iterate() in the mutable region when locking is enabled; is thrown. So far so good, let's convert the code at the end to to a Push Iterator. Let's look at the example in the basics doc, https://github.com/microsoft/FASTER/blob/main/docs/_docs/20-fasterkv-basics.md:

struct MyScanFunctions : IScanIteratorFunctions<MyKey, MyValue>
{
    // ... Implementation ...
}

MyScanFunctions myScanFunctions = new();
using var iter = session.Iterate(ref myScanFunctions);

The above example does not work, the Iterate method, taking the ref to the IteratorFunctions returns bool: https://github.com/microsoft/FASTER/blob/6a4c5a3bdec9a30e93dec2dd4a1fc1a874c8776f/cs/src/core/Index/FASTER/FASTERIterator.cs#L35-L37

Now, the enumeration method in the iterator is this code: https://github.com/microsoft/FASTER/blob/6a4c5a3bdec9a30e93dec2dd4a1fc1a874c8776f/cs/src/core/Index/FASTER/FASTERIterator.cs#L35-L53

If you look closely at the code, this is basically more or less a .NET style Enumerator, but since the for loop is already integrated into that piece of code above, it is not possible to wrap the Iterate method in anything like a typical .NET enumerator as seen below, which is in your logic of pull/push iterators a pull iterator.

How to solve it? We need access to the internal type FasterKVIterator<Key, Value, Input, Output, Context, Functions>, e.g. via another Iterate method IterateWithScanFunction, with a signature similar to this:

 public IFasterScanIterator<Key, Value> IterateWithScanFunction<Input, Output, Context, Functions>(Functions functions,  ref TScanFunctions scanFunctions, long untilAddress = -1)
            where Functions : IFunctions<Key, Value, Input, Output, Context>
           where TScanFunctions : IScanIteratorFunctions<Key, Value>

which then returns the Iterator, with isPull: false, then we can basically just copy the iteration loop and wrap it into a .NET style Enumerator. From a developer perspective this is far from ideal as internal logic then needs to be copied around into the dev's implementation, but developers using FASTER should be already aware that breaking changes are common enough.

Without such a method, conversion to a .NET style Enumerator requires either some Queueing mechanism (e.g. Channels) or blocking via a signalling mechanism, e.g. a Semaphore to allow the "current" value to be read from the outer enumerator, the first kinda defeats the idea of having iterators, because it just will push all the values into a temp list and then the outer enumerator consumes those values. The second one, via blocking is equally bad.

The examples for the push iterator, https://github.com/microsoft/FASTER/blob/main/cs/samples/StoreVarLenTypes/SpanByteSample.cs and https://github.com/microsoft/FASTER/blob/main/cs/samples/ReadAddress/VersionedReadApp.cs aren't really doing anything useful with iterated values, they basically just iterate the data and compare the count, they are more tests than an example on how to use Iterators.

Enumerator Example for the old pull iterator:

// please ignore DisposableKeyValuePair and IMemoryCacheKeyValue, they are just wrappes for the types to the stored in the cache.
public struct Enumerator : IEnumerator<DisposableKeyValuePair>, IEnumerator<IMemoryCacheKeyValue>
{
    private readonly FasterCache m_Cache;
    private readonly ClientSession m_Session;
    private IFasterScanIterator<SpanByte, SpanByte> m_Iterator;

    public Enumerator(FasterCache cache)
    {
        m_Cache = cache;
        m_Session = cache.GetSession();
        m_Iterator = m_Session.Iterate();
    }

    public bool MoveNext()
    {
        return m_Iterator.GetNext(out _);
    }

    public void Reset()
    {
        m_Iterator.Dispose();
        m_Iterator = m_Session.Iterate();
    }

    IMemoryCacheKeyValue IEnumerator<IMemoryCacheKeyValue>.Current => Current;

    public DisposableKeyValuePair Current
    {
        get
        {
            ref var key = ref m_Iterator.GetKey();
            ref var value = ref m_Iterator.GetValue();
            return new DisposableKeyValuePair(ref key, ref value);
        }
    }

    object IEnumerator.Current => Current;

    public void Dispose()
    {
        m_Iterator.Dispose();
        m_Cache.ReleaseSession(m_Session);
    }
}
TedHartMS commented 1 year ago

Thanks for the detailed explanation. There is a pending PR that removes this restriction on pull iterators.