microsoft / FASTER

Fast persistent recoverable log and key-value store + cache, in C# and C++.
https://aka.ms/FASTER
MIT License
6.29k stars 563 forks source link

Discussion: IDistributedCache - provision and questions #902

Open mgravell opened 5 months ago

mgravell commented 5 months ago

For context, I'm on the devdiv / aspnet team and am currently probably the main owner of the caching story.

IDistributedCache is (at least in .NET 8) the main caching backend abstraction, with local-memory, Redis, SQLite, SQL Server, Cosmos, etc backends. I propose that FASTER should have such, and have prototyped it here. The good news is: it seems to work great, and the performance is 🔥

IDistributedCache is getting some love in .NET 9, so my eyes are on this area.

I have some thoughts from this, though:

  1. should MSFT provide an official implementation? if so, where should it live? I'm happy for you to take mine
  2. request for eyeballs; am I doing anything wrong/incomplete? in particular, things like "session length", and also: since I implement expiration, do I need to explicitly remove something, or is returning false from the functions layer enough to cause that to happen?
  3. the serialization layer feels "old school"; there may be a lot of room for improvement there, to improve perf and reduce allocations and memory-copies; happy to have that discussion at length, but increasingly serializers are supporting IBufferWriter<byte> and ReadOnlySequence<byte> implementations; I'd love to have a conversation about that
  4. also, stateless serializers rather than serializer factories

Let me know if there is interest from this from the FASTER side; if not, I can try to do my best from the "other side"

jodydonetti commented 5 months ago

FWIW the current impl is already working well!

image

nb: I just needed to make a minor change, for which I made a PR.

badrishc commented 5 months ago

Thanks @mgravell - this is a great start! With a starting point like this, it should be possible to look closely and optimize it much further. I already see opportunities for improvement of at least 2X if not more. Garnet skyrocketing is keeping us a bit busy right now, but we will check it out in the next week or two, and perhaps integrate into our NuGets (ofc, only if you think that makes sense). cc @TedHartMS

Tornhoof commented 5 months ago

I'll compare it tomorrow against my own impl. If I remember correctly, the preferred method for session Handling is to cache and reuse the sessions instead of new() every time.

badrishc commented 5 months ago

Yes, that is one important optimization. The other is to use VarLenBlittableAllocator if the keys and values are blittable -- instead of paying the cost of object serializer at all.

Tornhoof commented 5 months ago

My own impl is not for a DistributedCache, but for a larger than memory local cache, using blittable keys and values (I write fixed size keys and var length binary serializable blobs only). I don't have sliding expiration, just a size check, which then cuts of segments after a certain size. So I have only one log device and not two. My clientsession type alias looks like:


using ClientSession  = ClientSession<SpanByte, SpanByte, SpanByte, SpanByteAndMemory, Empty, SpanByteFunctions<Empty>;

That being said, these are the two important optimizations, as badrish already wrote, cached sessions and blittable keys/values.

Cache sessions

I used a ConcurrentBag, I'm aware that ConcurrentQueue and/or ObjectPool might be better, depending on the congestion of the data structure, but none of my benchmarks ever showed any conclusive difference.

Other Tidbits

Notes about SpanByte

If you go the SpanByte route (which is the blittable route) and want to prevent memory copies, the complexity rises quite fast, as you need to do the async/await dance without async methods. This is required as you need to use fixed to create SpanBytes from your keys/values. This means you need to split into sync and async paths via ValueTask.IsCompletedSuccessfully.

At the end such a split method looks like this (for ReadAsync), it easily is 3 or 4 times as long and complex as doing it without the async/await dance and a lot more error prone, but Marc is probably more aware of async/await shenanigans than most of us, so if anyone would be safe doing that stuff, it is him.

This is only the ReadAsync implementation and related methods/types, I can't post the full implementation without asking legal (and not getting an answer in a timely fashion anyway). ```csharp public ValueTask ReadAsync(TKey key, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable { var session = GetSession(); return ReadAsyncInternal(key, session, true, cancellationToken); } private unsafe ValueTask ReadAsyncInternal(TKey key, ClientSession session, bool releaseSession, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable { try { Span keySpan = MemoryMarshal.AsBytes(new Span(ref key)); fixed (byte* _ = keySpan) // the fixed span needs only be fixed until the ReadAsync returns, for async completions it is copied internally { SpanByte keySpanByte = SpanByte.FromFixedSpan(keySpan); var task = session.ReadAsync(ref keySpanByte, token: cancellationToken); if (task.IsCompletedSuccessfully) { var (status, memory) = task.Result.Complete(); var result = HandleReadStatus(status, releaseSession, session, memory); return new ValueTask(result); } return CompleteReadAsync(task, releaseSession, session); } } catch { session.Dispose(); throw; } } private TValue? HandleReadStatus(Status status, bool releaseSession, ClientSession session, SpanByteAndMemory memory) where TValue : struct, IByteWritable { TValue? result = default; if (status.Found) // if found, we deserialize it { result = Deserialize(memory); } else if (!status.NotFound) // everything except not found is an error { ThrowInvalidOperation(status); } // for single ops we release the session here, for multi ops we wait until they are all finished if (releaseSession) { ReleaseSession(session); } return result; } private async ValueTask CompleteReadAsync(ValueTask.ReadAsyncResult> valueTask, bool releaseSession, ClientSession session) where TValue : struct, IByteWritable { try { var readAsyncResult = await valueTask.ConfigureAwait(false); var (status, memory) = readAsyncResult.Complete(); return HandleReadStatus(status, releaseSession, session, memory); } catch { session.Dispose(); throw; } } private static TValue Deserialize(SpanByteAndMemory spanByteAndMemory) where TValue : struct, IByteWritable { IMemoryOwner memory; int length; if (spanByteAndMemory.IsSpanByte) { // ReSharper disable once PossiblyImpureMethodCallOnReadonlyVariable (memory, length) = spanByteAndMemory.SpanByte.ToMemoryOwner(MemoryPool.Shared); } else { memory = spanByteAndMemory.Memory; length = spanByteAndMemory.Length; } return TValue.Deserialize(memory, length); } public interface IByteWritable where TSelf : struct { /// /// Deserialize the content from memory, the memory is now owned by this instance, dispose when finished /// public static abstract TSelf Deserialize(IMemoryOwner memory, int length); /// /// Make sure TSelf is readonly struct, output is written data, if the data does not fit into the existing output /// buffer is populated and needs to be disposed by the caller /// public static abstract void Serialize(in TSelf self, ref Span output, out IMemoryOwner? buffer, out int written); } ``` My tests indicated, that in-memory operations completed synchronously and disk-based operations asynchronously, so your tests need to be run with enough data, otherwise you might not hit the async code path.
mgravell commented 5 months ago

Lots to digest when at PC! Happy to have ignited some discussion.

Delete on close: agree useful option, but probably not the default; the other key scenario here is "I want the cache to survive web-server restarts, but it doesn't need to be centralised"

Sessions: great tips - will try to look

Blittable: ah, makes sense; I'll see what I can do - I was naively+blindly copying from the cache sample :)

Optimized async fast path: I am very much familiar (I do more than a little async IO inner-loop stuff - sockets, protocol handlers, etc).

Blittable spans - will look, although I wonder if there's a Memory-T trick we're missing there. I also need to contend with a small extra chunk of metadata for the header, so if ReadOnlySequence-byte could be processed as blittable (which it is, semantically, albeit non-contiguous), I could prepend metadata very cheaply (no buffer copy for the main payload). At worst case, I can do a lease+linearize, if FASTER will accept an array segment as a span. Note also that I'm .NET 9 I hope to add a secondary distributed cache backend API that takes ReadOnlySequence as the "set" payload type, to carry this "no right-sized single-use arrays" metaphor end-to-end. And IBufferWriter for the output payload mechanism.

Things that I think might be missing:

In serialization: it sounds like it isn't critical for this scenario (thanks blittable), but I do genuinely think that you could provide a much more direct (and efficient) additional serialization API for scenarios that do need it. I'd be very happy to offer input and suggestions there (I'm very deep in general .NET serialization and caching territory - I just don't know much about the internals of FASTER)

On Mon, 25 Mar 2024, 07:38 Stefan, @.***> wrote:

My own impl is not for a DistributedCache but for a larger then memory local cache, using blittable keys and values (I write fixed size keys and var length binary serializable blobs only). I don't have sliding expiration, just a size check, which then cuts of segments after a certain size. So I have only one log device and not two. My clientsession type alias looks like:

using ClientSession = ClientSession<SpanByte, SpanByte, SpanByte, SpanByteAndMemory, Empty, SpanByteFunctions;

That being said, these are the two important optimizations, as badrish already wrote, cached sessions and blittable keys/values. Cache sessions

I used a ConcurrentBag, I'm aware that ConcurrentQueue and/or ObjectPool might be better, depending on the congestion of the data structure, but none of my benchmarks ever showed any conclusive difference. Other Tidbits

  • I recommend adding an option to the options type to do DeleteOnClose (arg for CreateLogDevice), a lot of usages probably do not need to persist the cache on restart of the application.
  • My LogSettings are more complex and have extra ReadCacheSettings, according to https://microsoft.github.io/FASTER/docs/fasterkv-tuning/ it's useful for larger than memory caches, I followed that page for the other logsettings
  • My impl implements IEnumerable<...>, mostly for testing (the IDistributedCache interface does not include any enumeration support afaik)

Notes about SpanByte

If you go the SpanByte route (which is the blittable route) and want to prevent memory copies, the complexity rises quite fast, as you need to do the async/await dance without async methods. This is required as you need to use fixed to create SpanBytes from your keys/values. This means you need to split into sync and async paths via ValueTask.IsCompletedSuccessfully.

At the end such a split method looks like this (for ReadAsync), it easily is 3 or 4 times as long and complex as doing it without the async/await dance and a lot more error prone, but Marc is probably more aware of async/await shenanigans than most of us, so if anyone would be safe doing that stuff, it is him.

This is only the ReadAsync implementation and related methods/types, I can't post the full implementation without asking legal (and not getting an answer in a timely fashion anyway).

public ValueTask<TValue?> ReadAsync<TKey, TValue>(TKey key, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable{ var session = GetSession(); return ReadAsyncInternal<TKey, TValue>(key, session, true, cancellationToken);} private unsafe ValueTask<TValue?> ReadAsyncInternal<TKey, TValue>(TKey key, ClientSession session, bool releaseSession, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable{ try { Span keySpan = MemoryMarshal.AsBytes(new Span(ref key)); fixed (byte* _ = keySpan) // the fixed span needs only be fixed until the ReadAsync returns, for async completions it is copied internally { SpanByte keySpanByte = SpanByte.FromFixedSpan(keySpan); var task = session.ReadAsync(ref keySpanByte, token: cancellationToken); if (task.IsCompletedSuccessfully) { var (status, memory) = task.Result.Complete(); var result = HandleReadStatus(status, releaseSession, session, memory); return new ValueTask<TValue?>(result); }

        return CompleteReadAsync<TValue>(task, releaseSession, session);
    }
}
catch
{
    session.Dispose();
    throw;
}}

private TValue? HandleReadStatus(Status status, bool releaseSession, ClientSession session, SpanByteAndMemory memory) where TValue : struct, IByteWritable{ TValue? result = default; if (status.Found) // if found, we deserialize it { result = Deserialize(memory); } else if (!status.NotFound) // everything except not found is an error { ThrowInvalidOperation(status); }

// for single ops we release the session here, for multi ops we wait until they are all finished
if (releaseSession)
{
    ReleaseSession(session);
}

return result;}

private async ValueTask<TValue?> CompleteReadAsync(ValueTask<FasterKV<SpanByte, SpanByte>.ReadAsyncResult<SpanByte, SpanByteAndMemory, Empty>> valueTask, bool releaseSession, ClientSession session) where TValue : struct, IByteWritable{ try { var readAsyncResult = await valueTask.ConfigureAwait(false); var (status, memory) = readAsyncResult.Complete(); return HandleReadStatus(status, releaseSession, session, memory); } catch { session.Dispose(); throw; }} private static TValue Deserialize(SpanByteAndMemory spanByteAndMemory) where TValue : struct, IByteWritable{ IMemoryOwner memory; int length; if (spanByteAndMemory.IsSpanByte) { // ReSharper disable once PossiblyImpureMethodCallOnReadonlyVariable (memory, length) = spanByteAndMemory.SpanByte.ToMemoryOwner(MemoryPool.Shared); } else { memory = spanByteAndMemory.Memory; length = spanByteAndMemory.Length; }

return TValue.Deserialize(memory, length);}

public interface IByteWritable where TSelf : struct{ ///

/// Deserialize the content from memory, the memory is now owned by this instance, dispose when finished /// public static abstract TSelf Deserialize(IMemoryOwner memory, int length);

/// <summary>
/// Make sure TSelf is readonly struct, output is written data, if the data does not fit into the existing output
/// buffer is populated and needs to be disposed by the caller
/// </summary>
public static abstract void Serialize(in TSelf self, ref Span<byte> output, out IMemoryOwner<byte>? buffer, out int written);}

My tests indicated, that in-memory operations completed synchronously and disk-based operations asynchronously, so your tests need to be run with enough data, otherwise you might not hit the async code path.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/FASTER/issues/902#issuecomment-2017391257 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMAITGHIYBSCW533ORDYZ7IAJBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTIMZSGUZTAMRQQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGIYDIMZQGY4DSN5HORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you authored the thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

mgravell commented 5 months ago

log options: done

SpanByte - I have an unsuccessful branch here; I'm using SpanByte keys and values using UTF8 and a buffer (stack or leased, depending on size); write says it works, but read says "not found" - would love any clues if anyone has a moment

specifically, see the BasicUsage test; debug output shows (currently, broken state)

edit: it looks like the problem is that value is the right-size but all-zero in ConcurrentReader, where I'm checking the expiry

Tornhoof commented 5 months ago

@mgravell Debugging through your code, in CacheFunctions the payload in the expiry check is empty. I don't know anything about CacheFunctions sorry, I used Empty everywhere.

Ah, I missed your edit.

mgravell commented 5 months ago

I fixed it by switching to Memory<byte> instead of SpanByte; that's enough for me to make some progress; however... it wasn't the kind of progress I'd hoped for; we now get multiple allocations on the "get" path (I expect 10k here, for the final byte[] - the rest of the 36KB is... not ideal):

| Method          | KeyLength | PayloadLength | Mean        | Error       | StdDev      | Median      | Gen0   | Gen1   | Allocated |
|---------------- |---------- |-------------- |------------:|------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get      | 128       | 10240         |  2,786.6 ns |    55.11 ns |   130.98 ns |  2,763.4 ns | 2.1858 | 0.0992 |   36625 B |
| FASTER_Set      | 128       | 10240         |  2,132.4 ns |    36.65 ns |    34.28 ns |  2,136.4 ns | 0.5951 | 0.0153 |   10000 B |
| FASTER_GetAsync | 128       | 10240         |  2,860.2 ns |    57.15 ns |    98.58 ns |  2,877.0 ns | 2.1896 | 0.1984 |   36721 B |
| FASTER_SetAsync | 128       | 10240         |  2,240.5 ns |    43.76 ns |   119.06 ns |  2,187.1 ns | 0.5951 | 0.0191 |   10000 B |

I think to get this truly blazing with low allocs is going to take some input from the FASTER team - which is really the key goal here; see if we can get that momentum :)


update: got it down a bit using session reuse, but I have no clue what I'm doing with sessions, so this may not be valid; current state:

| Method          | KeyLength | PayloadLength | Mean        | Error       | StdDev      | Gen0   | Gen1   | Allocated |
|---------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get      | 128       | 10240         |    974.2 ns |    16.52 ns |    15.45 ns | 1.5955 | 0.0496 |   26696 B |
| FASTER_Set      | 128       | 10240         |    569.3 ns |     5.22 ns |     4.63 ns | 0.0038 |      - |      72 B |
| FASTER_GetAsync | 128       | 10240         |  1,051.4 ns |    19.92 ns |    18.64 ns | 1.5984 | 0.0992 |   26792 B |
| FASTER_SetAsync | 128       | 10240         |    780.6 ns |     1.57 ns |     1.31 ns | 0.0038 |      - |      72 B |
| SQLite_Get      | 128       | 10240         | 12,137.0 ns |   200.33 ns |   187.39 ns | 0.6866 |      - |   11616 B |
| SQLite_Set      | 128       | 10240         | 76,377.2 ns | 1,482.33 ns | 1,522.25 ns |      - |      - |    1360 B |
| SQLite_GetAsync | 128       | 10240         | 22,139.6 ns |   297.40 ns |   278.19 ns | 0.7019 |      - |   11912 B |
| SQLite_SetAsync | 128       | 10240         | 75,067.2 ns | 1,035.19 ns |   864.43 ns |      - |      - |    1280 B |
| Rocks_Get       | 128       | 10240         |    123.8 ns |     0.79 ns |     0.70 ns | 0.0091 |      - |     152 B |
| Rocks_Set       | 128       | 10240         | 12,619.8 ns |    68.27 ns |    53.30 ns |      - |      - |     224 B |
| Rocks_GetAsync  | 128       | 10240         |    763.4 ns |    18.70 ns |    55.15 ns | 0.0257 |      - |     440 B |
| Rocks_SetAsync  | 128       | 10240         | 21,776.1 ns |   228.28 ns |   213.54 ns | 0.0305 |      - |     576 B |

(note: Rocks is cheating on the "get")

Tornhoof commented 5 months ago

but I have no clue what I'm doing with sessions

You didn't push your code, atleast I couldn't find it, but if it looks similar to this, then you should be good, I missed that part in the details section of my earlier post.

private readonly ConcurrentBag<ClientSession> _sessions = new();
private ClientSession GetSession()
        {
            if (_sessions.TryTake(out var result))
            {
                return result;
            }

            var session = _store.For(_simpleFunctions).NewSession<SpanByteFunctions<Empty>>();
            return session;
        }

        private void ReleaseSession(ClientSession session)
        {
            _sessions.Add(session);
        }

I dispose a session after Exceptions, because I simply don't know the state of them.

mgravell commented 5 months ago

I had basically the same - hadn't pushed. I think I've figured out the spanbyte life cycle too, locally - again, not pushed; hopefully tomorrow

On Mon, 25 Mar 2024, 18:17 Stefan, @.***> wrote:

but I have no clue what I'm doing with sessions

You didn't push your code, atleast I couldn't find it, but if it looks similar to this, then you should be good, I missed that part in the details section of my earlier post.

private readonly ConcurrentBag _sessions = new();private ClientSession GetSession() { if (_sessions.TryTake(out var result)) { return result; }

        var session = _store.For(_simpleFunctions).NewSession<SpanByteFunctions<Empty>>();
        return session;
    }

    private void ReleaseSession(ClientSession session)
    {
        _sessions.Add(session);
    }

I dispose a session after Exceptions, because I simply don't know the state of them.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/FASTER/issues/902#issuecomment-2018623948 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMEAX3RHIBO6FFCGU3DY2BS3HBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTIMZSGUZTAMRQQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGIYDIMZQGY4DSN5HORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you authored the thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

mgravell commented 5 months ago

Here we go; definitely has legs!

Current benchmarks (edit immediately below same comment - table is now out of date):

| Method                | KeyLength | PayloadLength | Mean        | Error       | StdDev      | Gen0   | Gen1   | Allocated |
|---------------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get            | 128       | 10240         |    552.3 ns |    21.16 ns |    18.76 ns | 0.6123 |      - |   10264 B |
| FASTER_Set            | 128       | 10240         |    826.7 ns |    15.10 ns |    10.91 ns | 0.6123 |      - |   10264 B |
| FASTER_GetBuffer      | 128       | 10240         |    315.5 ns |     5.65 ns |     4.09 ns |      - |      - |         - |
| FASTER_SetBuffer      | 128       | 10240         |    480.6 ns |     7.08 ns |     4.68 ns |      - |      - |         - |
| FASTER_GetAsync       | 128       | 10240         |    625.5 ns |    16.34 ns |    14.48 ns | 0.6189 |      - |   10360 B |
| FASTER_GetAsyncBuffer | 128       | 10240         |    367.6 ns |     6.24 ns |     2.77 ns | 0.0014 |      - |      24 B |
| FASTER_SetAsync       | 128       | 10240         |    903.8 ns |    15.46 ns |    11.18 ns | 0.6123 |      - |   10264 B |
| FASTER_SetAsyncBuffer | 128       | 10240         |    585.0 ns |    10.59 ns |     3.78 ns |      - |      - |         - |
| SQLite_Get            | 128       | 10240         | 12,717.9 ns |   205.08 ns |    73.13 ns | 0.6866 |      - |   11616 B |
| SQLite_Set            | 128       | 10240         | 75,747.7 ns | 1,300.16 ns |   773.70 ns | 0.6104 |      - |   11576 B |
| SQLite_GetAsync       | 128       | 10240         | 21,364.4 ns |   403.96 ns |   179.36 ns | 0.7019 |      - |   11912 B |
| SQLite_SetAsync       | 128       | 10240         | 76,450.3 ns | 1,478.02 ns | 1,153.94 ns | 0.6104 |      - |   11496 B |
| Rocks_Get             | 128       | 10240         |    843.3 ns |    17.26 ns |    15.30 ns | 0.6218 |      - |   10416 B |
| Rocks_Set             | 128       | 10240         | 14,182.6 ns |   247.34 ns |   178.84 ns | 0.6104 |      - |   10416 B |
| Rocks_GetAsync        | 128       | 10240         |  1,876.0 ns |    88.41 ns |    82.69 ns | 0.7935 | 0.0267 |   10704 B |
| Rocks_SetAsync        | 128       | 10240         | 24,844.1 ns | 1,422.47 ns | 1,330.58 ns | 0.6409 |      - |   10768 B |

Outstanding things I could do with help from people with:

But the numbers look nice, especially for the .NET 9 scenario (FASTER_GetAsyncBuffer). I can also possibly improve the "pinning" on the async path; right now I use a simple using (memory.Pin()) - however, we could do a fixed pin with a speculative "does this finish synchronously", and only take the non-stack pin if we need to transition to full async

/cc @jodydonetti @badrishc

update: now with the deferred pinning on the async get paths:

| Method                | KeyLength | PayloadLength | Mean        | Error       | StdDev      | Gen0   | Gen1   | Allocated |
|---------------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get            | 128       | 10240         |    642.5 ns |    37.46 ns |    33.20 ns | 0.6123 |      - |   10264 B |
| FASTER_Set            | 128       | 10240         |  1,036.3 ns |    22.11 ns |    19.60 ns | 0.6123 |      - |   10264 B |
| FASTER_GetBuffer      | 128       | 10240         |    388.2 ns |    11.98 ns |    10.62 ns | 0.0019 |      - |      32 B |
| FASTER_SetBuffer      | 128       | 10240         |    601.8 ns |    16.66 ns |    15.59 ns |      - |      - |         - |
| FASTER_GetAsync       | 128       | 10240         |    684.4 ns |    41.90 ns |    34.99 ns | 0.6189 |      - |   10360 B |
| FASTER_GetAsyncBuffer | 128       | 10240         |    393.3 ns |     7.84 ns |     6.95 ns | 0.0033 |      - |      56 B |
| FASTER_SetAsync       | 128       | 10240         |  1,056.5 ns |    34.24 ns |    32.03 ns | 0.6123 |      - |   10264 B |
| FASTER_SetAsyncBuffer | 128       | 10240         |    663.0 ns |    16.03 ns |    14.21 ns |      - |      - |         - |
| SQLite_Get            | 128       | 10240         | 13,180.0 ns |   277.07 ns |   259.17 ns | 0.6866 |      - |   11616 B |
| SQLite_Set            | 128       | 10240         | 74,546.1 ns | 2,079.11 ns | 1,944.80 ns | 0.6104 |      - |   11576 B |
| SQLite_GetAsync       | 128       | 10240         | 27,643.9 ns |   429.91 ns |   190.88 ns | 0.7019 |      - |   11912 B |
| SQLite_SetAsync       | 128       | 10240         | 81,441.6 ns | 1,887.54 ns | 1,673.26 ns | 0.6104 |      - |   11496 B |
| Rocks_Get             | 128       | 10240         |    940.4 ns |    36.74 ns |    34.37 ns | 0.6218 |      - |   10416 B |
| Rocks_Set             | 128       | 10240         | 14,473.9 ns |   282.96 ns |   100.91 ns | 0.6104 |      - |   10416 B |
| Rocks_GetAsync        | 128       | 10240         |  1,887.8 ns |   108.86 ns |   101.82 ns | 0.7973 | 0.0267 |   10704 B |
| Rocks_SetAsync        | 128       | 10240         | 25,086.6 ns |   964.64 ns |   902.33 ns | 0.6409 |      - |   10768 B |
jodydonetti commented 5 months ago

This is really great, this evening I'll try to get the latest version from the repo and see if it works fine with FusionCache in the Simulator (I guess it will be flawless, like it was already before).

Will update, thanks!

mgravell commented 5 months ago

@jodydonetti I've also thrown it onto NuGet, mostly to stop squatters, but: it is up-to-date

Tornhoof commented 5 months ago

persistence - not sure anything is writing to disk currently; we should fix that

My dumb Test approach was simple, write a lot of data, look for the file on disk, then truncate the Log and wait for It to disappear

Tornhoof commented 5 months ago

however, we could do a fixed pin with a speculative "does this finish synchronously", and only take the non-stack pin if we need to transition to full async

I ported my valuetask.iscompletedsuccessfully flow to your repo (only for getasync) and it's about 30-40ns faster than the original code, so less than 10%. That's, as far as I know the most sync it will ever be, for in-memory. So any transition code will be probably slower.

https://github.com/Tornhoof/FASTERCache/commit/136dbd160193503bdeb20f8819a7a09f2398f552

mgravell commented 5 months ago

@Tornhoof I got the increase to 20% with full pin defer and async transition ;p

jodydonetti commented 5 months ago

@jodydonetti I've also thrown it onto NuGet, mostly to stop squatters, but: it is up-to-date

Updated (v0.1.21), switched to using the new builder and tested again: everything seems to work well, even when mixing auto-recovery, soft/hard timeouts and other features!

PaulusParssinen commented 5 months ago
  1. the serialization layer feels "old school"; there may be a lot of room for improvement there, to improve perf and reduce allocations and memory-copies; happy to have that discussion at length, but increasingly serializers are supporting IBufferWriter<byte> and ReadOnlySequence<byte> implementations; I'd love to have a conversation about that
  2. also, stateless serializers rather than serializer factories

+1 for redesigning the the serialization layer. I was about to do that in Garnet (and Tsavorite) but I luckily noticed this issue before doing anything. Anything but streams and I'd be very happy to advocate for copying the design over to Tsavorite and Garnet😄

mgravell commented 5 months ago

Related to that "anything but streams" piece: https://github.com/dotnet/runtime/issues/100434 - i.e. "and here's how you fallback for legacy serializers"