Closed anth-git closed 4 months ago
Thanks for the report. I am seeing similar results. It looks like nats bench is sending the requests and not waiting for the responses one after the other whereas in our .NET implementation we are. When I batch the publish tasks I'm seeing similar figures e.g.:
EDIT ⚠️ don't use this with long running large batches. creates a lot of pressure on GC causes operation cancelled exceptions. see also #523
const int batch = 10_000;
for (int i = 0; i < msgCount / batch; ++i)
{
var tasks = new List<Task<PubAckResponse>>();
for (int j = 0; j < batch; j++)
{
Task<PubAckResponse> publishAsync = js.PublishAsync<byte[]>(subject: "test.subject", data).AsTask();
tasks.Add(publishAsync);
}
foreach (var task in tasks)
await task;
}
//
// Produced 100000 messages in 1417 ms; 71k msg/s ~ 138 MB/sec
//
// nats bench bar --js --pub 1 --size 2048 --msgs 100000
// Pub stats: 74,439 msgs/sec ~ 145.39 MB/sec
//
Edit: and if we batch all of it we get the same result:
Produced 100000 messages in 1343 ms; 74k msg/s ~ 145 MB/sec
Yes I figured that it must have something to do with batching. I found --pubbatch
flag in nats bench
, and when I set it to 1, performance deteriorated significantly and was similar to results obtained using .Net client:
./nats bench bar --js --pub 1 --size 2048 --msgs 100000 --pubbatch 1
Pub stats: 16,539 msgs/sec ~ 32.30 MB/sec
Anyway, shouldn't batching be implemented in the client? Similarly as it's in Kafka client (batch.size, linger.ms)?
Anyway, shouldn't batching be implemented in the client? Similarly as it's in Kafka client (batch.size, linger.ms)?
We should be able to implement that but I'm not sure what the API would look like in terms of collecting ACKs.
Anyway, shouldn't batching be implemented in the client? Similarly as it's in Kafka client (batch.size, linger.ms)?
We should be able to implement that but I'm not sure what the API would look like in terms of collecting ACKs.
FWIW, in the past when I've had to do similar, it's been something like;
[buffer-aggregating-to-a-byte-or-count-limit]->[stage taking what got buffered and setting up a 'lookup' from each thing to a TCS or poolable
IValueTaskSourcetype thing]->[writing the stuff]->[setting the TCS or IVTS]
When I do it in Akka Streams, (think something between channels and enumerables but with an awesome DSL,) it's usually done as two stages (BatchWeighted
and SelectAsync
). Of course we can't take Akka Streams as a dependency. ;)
Those -can- be represented as channel stages, which can make things a -little- easier, however I would worry about how good/bad it would play with Unity/AOT type scenarios once netstandard merges (best simple practice for such a pipeline is to Task.Run
each stage, not sure if that works in Unity.)
I'll admit, I've not looked at how JS publish works, based on what I know about the rest of parts of pipeline this may give some ideas to at least think about however.
(have gists of a few cases, although they are more of a SQL/read type thing...)
@to11mtm so js publish is basically publish with a reply-to subject (inbox) which essentially collects ACKs. Current implementation achieves this by creating a temporary subscription on the mux inbox. So, we can make that more efficient by creating a single subscription and process 'batches' (which would be semaphores to avoid flooding the server with PUBs waiting for ACKs) concurrently. Flow wise we need task collecting ACKs, a loop on the current task to publish messages and another task to check ACKs for errors (or maybe we only need to track errors so the application can act on them).
@anth-git fyi we have a PR to help with batching:
Observed behavior
Is the .NET client 10x slower than the native one, or am I doing something wrong?
And the code I'm using (.NET 8):
Expected behavior
It should have similar performance
Server and client version
nats-server: v2.10.10 nats-cli: v0.1.1 NATS.Client.Core: v2.0.3
Host environment
No response
Steps to reproduce
No response