martinothamar / Mediator

A high performance implementation of Mediator pattern in .NET using source generators.
MIT License
2.16k stars 71 forks source link

Parallel notifications, correct ordering for switch statement cases #145

Closed martinothamar closed 7 months ago

martinothamar commented 8 months ago

Solves #115 and #138

Still unsure if netstandard2.0 is the way to go.. that might chage

TODO

Benchmarks

image

martinothamar commented 8 months ago

Note to future self: Putting more of the code into Mediator/Mediator.Abstractions is good, easier to read and iterate on. Source generation is mostly important to avoid reflection, mapping opaque messages to concrete messages and resolving the correct handlers. I.e. once the concrete message is unwrapped and the underlying handlers are instantiated and ready, the benefits of source generation/compilation analysis are fully exhausted I think

TimothyMakkison commented 8 months ago

Impressive changes, haven't done a major deep dive yet.

It might be possible to create a more memory efficient awaiter for known lengths. Task[] will allocate 24 + remaining length * 8 bytes, this means that in the common case that 2 notifications are asynchronous we'll have to allocate 40 bytes to store 2 references to Task values.

Instead Mediator could have methods for common array lengths where instead of having a nullable array we have task variables, one for each possibly asynchronous task. This way the array is inlined inside of the generated async state machine saving the 24 bytes for the array.

I think this will always use less memory than the array equivialent, up to 4 handlers. Thereafter unlikely setups may use slightly more memory.


async ValueTask Publish2(INotificationHandler[] handlers, TNotification notification, CancellationToken cancellationToken)
{
    Task task0 =  handler[0].Handle(notification, cancellationToken).AsTask();
    Task task1 =  handler[1].Handle(notification, cancellationToken).AsTask();

    List<Exception>? exceptions = null;

    try
    {
        await task0;
    }
    catch (Exception ex)
    {
        exceptions ??= new List<Exception>(1);
        exceptions.Add(ex);
    }
    // await each one etc
}
martinothamar commented 7 months ago

Took me a while to be able to benchmark this properly... But yes it makes a difference, atleast for the synchronous case

Top: before Bottom: after

image

Now I just have to cleanup a bit before I can push... made some significant config changes to make it easier to benchmark and test different things based on compile time config

TimothyMakkison commented 7 months ago

Thanks for sharing the benchmarks!

Looks like it saved a small amount of memory for the async case, although I can't tell why it saves 32 (35) bytes and not the expected 24. I'll look at the lowered code in the morning, benchmarkdotnet might be measuring memory usage incorrectly.

The synchronous case getting a speed boost is unexpected. Perhaps it's due to loop unrolling.

martinothamar commented 7 months ago

Yeah when it's manually unrolled like that I'm guessing it can elide bounds checks as well, I kind of doubt it is able to do so for enumerators even if they are plain structs (not boxed, no virtual calls)

EDIT: nvm that was not the custom enumerator, codegen looks identical. I guess it just saves the control flow of the loop? branches and jumps, even if they are predictable

Updated the PR desc with a full run through benchmarks as well

martinothamar commented 7 months ago

Great points as always! Reduced the allocations in the asynchronous case. I also made sync optimization for the foreach publisher, just by "branching inwards". After extending NotificationHandlers<TNotification> to implement IEnumerable<> some inlining decisions changed and 2x'ed latency for some cases, so with this latest commit I'm a bit more careful about extracting the non-common branches into separate functions, seems to do the trick :smile: image

Also included sample and docs. I'm pretty happy with this now

martinothamar commented 7 months ago

True it's pretty naive, I guess I'm betting there aren't many inheritance hierarchies out there, atleast there isn't in my code :P

We can improve it in subsequent PRs, just want to get a preview out now so that I can test it some more