nats-io / nats.net

Full Async C# / .NET client for NATS
https://nats-io.github.io/nats.net
Apache License 2.0
239 stars 49 forks source link

JetStream publish performance #450

Closed safayatborhan closed 4 months ago

safayatborhan commented 6 months ago

Observed behavior

I have a dotnet 6 application running on nats.net. If I migrate to nats.net.v2, the performance is degrading drastically. Here is the code I am running with nats.net.v2 library:

var natsOptions = NatsOpts.Default;
natsOptions = natsOptions with { Url = "localhost" };
await using var nats = new NatsConnection(natsOptions);
var js = new NatsJSContext(nats);
var config = new StreamConfig(name: "EVENTS", subjects: new[] { "events.>" });
config.Storage = StreamConfigStorage.File;
var stream = await js.CreateStreamAsync(config);

var tasks = new List<Task>();
for (int i = 0; i < 1000; i++)
{
    var task = Task.Run(async () =>
    {
        while (true)
        {
            var sw = Stopwatch.StartNew();
            for (var i = 0; i < 2; i++)
            {
                await js.PublishAsync<object>(subject: "events.page_loaded", data: null);
                await js.PublishAsync<object>(subject: "events.mouse_clicked", data: null);
                await js.PublishAsync<object>(subject: "events.mouse_clicked", data: null);
                await js.PublishAsync<object>(subject: "events.page_loaded", data: null);
                await js.PublishAsync<object>(subject: "events.mouse_clicked", data: null);
                await js.PublishAsync<object>(subject: "events.input_focused", data: null);
            }
            Console.WriteLine($"Total time taken: {sw.Elapsed.TotalSeconds}");
        }
    });
    tasks.Add(task);
}

await Task.WhenAll(tasks);

Sample output: image

Here is the same code I am running with nats.net library:

Options opts = ConnectionFactory.GetDefaultOptions("localhost");
ConnectionFactory connectionFactory = new ConnectionFactory();
var conn = connectionFactory.CreateConnection(opts);
IJetStream jetStream = conn.CreateJetStreamContext();
IJetStreamManagement jetStreamManagement = conn.CreateJetStreamManagementContext();

jetStreamManagement.AddStream(StreamConfiguration.Builder()
                .WithName("EVENTS")
                .WithStorageType(StorageType.File)
                .WithSubjects("events.>")
                .Build());

var tasks = new List<Task>();
for (int i = 0; i < 1000; i++)
{
    var task = Task.Run(() =>
    {
        while (true)
        {
            var sw = Stopwatch.StartNew();
            for (var i = 0; i < 2; i++)
            {
                jetStream.Publish(subject: "events.page_loaded", data: null);
                jetStream.Publish(subject: "events.mouse_clicked", data: null);
                jetStream.Publish(subject: "events.mouse_clicked", data: null);
                jetStream.Publish(subject: "events.page_loaded", data: null);
                jetStream.Publish(subject: "events.mouse_clicked", data: null);
                jetStream.Publish(subject: "events.input_focused", data: null);
            }

            Console.WriteLine($"Total time taken: {sw.Elapsed.TotalSeconds}");
        }
    });
    tasks.Add(task);
}

await Task.WhenAll(tasks);

Sample output: image

Expected behavior

The latest library should be as fast as how it was before.

Server and client version

Nats.net 2.1.2

Host environment

No response

Steps to reproduce

Repo link, if want to reproduce.

mtmk commented 6 months ago

not sure if this is a good test. v1 doesn't actually seem to be publishing as fast:

nats stream ls
╭───────────────────────────────────────────────────────────────────────────────────╮
│                                      Streams                                      │
├──────────┬─────────────┬─────────────────────┬───────────┬─────────┬──────────────┤
│ Name     │ Description │ Created             │ Messages  │ Size    │ Last Message │
├──────────┼─────────────┼─────────────────────┼───────────┼─────────┼──────────────┤
│ EVENTSv1 │             │ 2024-03-22 12:13:22 │ 928,673   │ 45 MiB  │ 0s           │
│ EVENTSv2 │             │ 2024-03-22 12:13:22 │ 2,250,102 │ 108 MiB │ 0s           │
╰──────────┴─────────────┴─────────────────────┴───────────┴─────────┴──────────────╯
mtmk commented 6 months ago

I have a dotnet 6 application running on nats.net. If I migrate to nats.net.v2, the performance is degrading drastically. Here is the code I am running with nats.net.v2 library:

@safayatborhan, where are you seeing the performance in your real application? Could you elaborate on your use case a little more?

In your test application v1 Publish() is looking faster but overall performance doesn't show that, so a little confused about that tbh.

(Thank you for the example repo btw 💯 a lot easier for me to validate the issues)

safayatborhan commented 6 months ago

I have a dotnet 6 application running on nats.net. If I migrate to nats.net.v2, the performance is degrading drastically. Here is the code I am running with nats.net.v2 library:

@safayatborhan, where are you seeing the performance in your real application? Could you elaborate on your use case a little more?

In your test application v1 Publish() is looking faster but overall performance doesn't show that, so a little confused about that tbh.

(Thank you for the example repo btw 💯 a lot easier for me to validate the issues)

Hi @mtmk, I can see how overall performance is better for V2. In the example, we are awaiting for each message to be processed. That's why the processing time is higher for V2. Sorry for the confusion.

In actual scenario, we are trying to publish around 10000 messages/sec. Each message size is around 1kb. For processing each message the legacy NATS client (stan) took around 1.77E-05. But as it is out of life, we migrated to Jetstream and here is the data for v1 and v2 client of Jetstream: V1: 0.0000208 approximately V2: 0.0020981 approximately

darkwatchuk commented 6 months ago

I see the same magnitude of difference too. Also, starting the v1 app seems to be instant, however the v2 app takes several seconds to start.....

Edit : I believe I was testing this in the VS IDE and whilst V1 is faster there, V2 is faster outside

safayatborhan commented 6 months ago

Hi @mtmk ,

This time I am testing by awaiting each message to be processed. Latest code has been pushed to repo. Look at the difference now:

image

After running around 5-7 minutes: image

mtmk commented 6 months ago

I'm afraid your metric above doesn't make sense to me as meaningful comparison. I'm seeing very different results on different machines with different number of concurrent tasks. But, when tuned (for example number of tasks), overall I'm not seeing hardly any difference unfortunately. I assume you're interested in throughput and I suggest to have a look at this issue:

Having said that, I think there are improvements we can make in the request-reply pattern (which I will start investigating soon #453) but I don't think it would make a material difference to how many messages you can send per second.

edit: I can reproduce similar results as above, when run on a single core cloud machine. but when run on my desktop machine with multiple cores, results are very different:

dotnet run -c release:
V1 Total time taken: 0.4129713    V2 Total time taken: 0.1883956
V1 Total time taken: 0.4114895    V2 Total time taken: 0.1772481
V1 Total time taken: 0.4122838    V2 Total time taken: 0.1846692

after running a few seconds:
│ EVENTSV1 │             │ 2024-03-23 23:05:38 │ 588,580   │ 604 MiB │ 1.86s        │
│ EVENTSV2 │             │ 2024-03-23 23:05:38 │ 1,302,142 │ 1.3 GiB │ 1.38s        │

when compiled AOT:

                                  V2 Total time taken: 0.1448319
                                  V2 Total time taken: 0.1446241
                                  V2 Total time taken: 0.1365826

(this is running code from your repo https://github.com/safayatborhan/Memory.Test with no changes)

caleblloyd commented 5 months ago

@safayatborhan I also cannot recreate the latency that you are seeing. I forked your Memory.Test repo to caleblloyd/Nats.Net.V2.ConcurrencyTests and changed a few things. Tested adding latency between the program and the nats-server and V2 ran much better than V1.

What OS and hardware are you getting those results on?

to11mtm commented 5 months ago

What OS and hardware are you getting those results on?

Also @safayatborhan if you can please confirm which version of the .NET Runtime your Net6 tests were run against, especially if it is not 6.0.6 or newer (just to be safe, this was something AkkaDotNet encountered, but was fixed in 6.0.6 and newer)

Agreed that hardware could also make a huge difference here (and may still enlighten on opportunities.) I certainly have guesses as to what could happen with a low core count/etc but am working on being better about my tangents. :)

safayatborhan commented 5 months ago

@caleblloyd and @to11mtm , Thanks for your good inputs. I can still confirm it's slower in my machine. Here is the configurations: Jetstream server: nats-server-v2.10.10-windows-386 Hardware configuration: 11th Gen Core i7 (3 GHz), 32 GB Ram

This is surprising to me that you are getting different result.

caleblloyd commented 5 months ago

windows-386 is a 32-bit architecture, can you try running the windows-amd64 nats-server instead? It is a 64-bit architecture that will work with Intel CPUs.

safayatborhan commented 5 months ago

@caleblloyd I am getting similar response after changing the server to AMD64(nats-server-v2.10.14-windows-amd64). After running few min:

image

mtmk commented 4 months ago

@safayatborhan did you make any progress? it'd be good to get on the same page on this 😅

darkwatchuk commented 4 months ago

If it's any help, there are significant differences between running the tests inside the Visual Studio IDE vs outside, even on Release x64 code.

Inside the IDE V1 seems to always win, outside of the IDE, V2 wins.

E.g. On my 24 Core desktop... Windows 11

10 Tasks

Release - Inside of IDE

V1 Avg: 0.014, Min: 0.003, Max: 0.058, Max Threads: 18 V2 Avg: 0.031, Min: 0.002, Max: 0.072, Max Threads: 18

Release - Outside of IDE

V1 Avg: 0.006, Min: 0.001, Max: 0.067, Max Threads: 19 V2 Avg: 0.002, Min: 0.001, Max: 0.016, Max Threads: 14

50 Tasks

Release - Inside of IDE

V1 Avg: 0.074, Min: 0.004, Max: 0.184, Max Threads: 37 V2 Avg: 0.192, Min: 0.100, Max: 0.233, Max Threads: 26

Release - Outside of IDE

V1 Avg: 0.024, Min: 0.002, Max: 0.114, Max Threads: 24 V2 Avg: 0.006, Min: 0.002, Max: 0.013, Max Threads: 24

mtmk commented 4 months ago

thanks @darkwatchuk it defo helps 💯 so the figures are ms per message? lower is better?

I know I'm jumping the gun and going on a tangent maybe (sorry @to11mtm 😅) but I'm also not convinced the method of measuring performance as suggested above isn't producing helpful or practical results unfortunately. what are your thoughts and if you agree how should we measure it?

darkwatchuk commented 4 months ago

Yes, lower is better. This was running the code from the provided repo. As @safayatborhan is running Windows, it could be that he's running and testing from Visual Studio maybe and starts accidentally in debugging mode with even with the release build. Just guessing. But certainly from what I can see it produces drastically different results and can easily give the wrong impression. Clearly the VS debugging overhead for V2 is higher for V1.

safayatborhan commented 4 months ago

@darkwatchuk Thanks for your valuable insights. And you are right to the fact that, I was running those test under IDE. And also I am getting similar result as yours.

mtmk commented 4 months ago

Thanks so much, @darkwatchuk, for getting to the bottom of this. This highlighted that there is a performance improvement we can make for request-reply. However, it probably won't help in this scenario, but nevertheless, it did help highlight that. I will close this issue now. Thanks.