409 Batch Completed & Pending-Messages/Bytes for Pull Consumers

Jarema commented 1 year ago

Overview

When MaxBytes is used in Pull Request, when the Pull Request finishes because it hits MaxBytes, clients do not know how many messages and bytes are pending.

To address that, a new status has been added and will land with 2.10 release: 409 Batch Completed

With added information about not sent bytes and messages requested by Pull Request:

Nats-Pending-Bytes - number of bytes that were still pending for that Pull Request Nats-Pending-Messages - number of messages that were still pending for that Pull Request

Those two headers were also added to 408 Request Timeout statuses and are available in 2.9.x.

This allows clients, especially for JS Simplification, to calculate future Pull Request sizing accurately.

Relevant ADR will be issued.

Clients and Tools

[ ] Go @piotrpio
[x] Java @scottf https://github.com/nats-io/nats.java/pull/870
[ ] JavaScript @aricart
[x] .Net @scottf https://github.com/nats-io/nats.net/pull/755
[ ] C @levb
[ ] Python @wallyqs
[ ] Ruby @wallyqs
[x] Rust @Jarema @caspervonb

Other Tasks

[ ] docs.nats.io updated @bruth
[ ] Update ADR to Implemented
[ ] Update client features spreadsheet

Client authors please update with your progress. If you open issues in your own repositories as a result of this request, please link them to this one by pasting the issue URL in a comment or main issue description.

scottf commented 1 year ago

Is it possible to use a code other than 409 since this is a success result? Maybe a 200?
I thought we already solved the bytes problem, since we figured out the exact formula that the server uses. I added it to the pending ADR: https://github.com/nats-io/nats-architecture-and-design/blob/js-simplification/adr/ADR-32.md#message-size-calculation
Did you mean to say "Those two headers are already part of the 408 Request Timeout statuses." I'm certain they are already in 2.9.x

aricart commented 1 year ago

not sure we should do a diff code in this stage. For 3.x I would really like to have different codes...

Jarema commented 1 year ago

Problem with Pull Request Statuses is that very few are "error" or "success". They're statuses. Though we can discuss it.
It was, but the link you're referring is using those headers mentioned here. I wanted to make those changes visible by having an issue in the repo.
yes, fixed :)

scottf commented 1 year ago

Just to be clear about behavior. Given this:

{"batch":10,"max_bytes":20000,"expires":5000000000,"idle_heartbeat":2000000000}

If you get 10 messages that total less than 20000 bytes you get 409 Batch Completed
If you get less than 10 messages because the message would put you over 20000, you get 409 Message Size Exceeds MaxBytes
If 10 messages add up to exactly 20000 bytes you get no status
It the pull expires you get a 408

Items 2, 3 and 4 are not affected by this new status.

In looking at my code I don't need this new message. I have already handled the situation by :

noting whether or not a pull request has max bytes (tracking-pending-bytes)
always zeroing the pending bytes when pending message count went to zero.
conditionally zeroing the message count if pending-bytes went to zero when tracking-pending-bytes.

For servers before 2.10 it has to be done this way. For 2.10 and later I could have different logic to not zero out and only rely on the status, but why bother when the above already works. But I do have to have code to specifically ignore this status, because it is a 409 and this makes the 3rd different behavior for 409s, which require checking the text to know which.

In case you are wondering, I treat these as errors:

409 Consumer Deleted
409 Consumer is push based

These as warnings:

409 Message Size Exceeds MaxBytes
409 Exceeded MaxWaiting
409 Exceeded MaxRequestBatch
409 Exceeded MaxRequestExpires
409 Exceeded MaxRequestMaxBytes

And ignore this

409 Batch Completed

scottf commented 1 year ago

@Jarema I'm still thinking about the case you said that we are being bytes centric and just needed to work it out.

TLDR; I see that I need the Batch Complete

Considering simplification and the pull request config at 12 message and 12000 bytes. Do the first pull, the server starts tracking at 12/12000 We get 2 messages consisting of 3100 bytes. This triggers the 2nd pull (3100 > 25% of max bytes), so we pull another 9 message and 9000 bytes.

So my tracking is now at 12 - 2 + 9 = 19 messages and 12000 - 3100 + 9000 - 17900 bytes. The server tracking has 2 pulls, one with 10/8900 and the 2nd with 9/9000

Let's say then 18 more messages come in, 9 to each pull, each 900 bytes. (Granted this will have triggered a 3rd maybe 4th pull, but ignore that for now...)

Let's say then that the 2nd pull completes first with 9 messages of 8100 bytes. There were 900 bytes leftover, I won't zero out bytes because I still have more than 1 message, but if I don't track that 900, my byte count will be off by that much, which will affect the re-pull.

scottf commented 1 year ago

So I guess the only question is, what do we do pre 2.10? I guess it will have to be "close enough for government work" :)

scottf commented 1 year ago

Not a problem with simplification, but some situations that are always going to be a problem with [users] making multiple raw pulls.

They issue some pulls with heartbeats and some with out, or different heartbeat times. It's impossible to know which pull and when to not expect heartbeats and when the frequency changed.
They issue some pulls with max bytes and some without.

One possible way to address this is by requiring all subsequent raw pulls to make sense. For instance once a pull turns on heartbeats, all subsequent pulls must have the same heartbeat. Once a pull uses max bytes all subsequent pulls have max bytes.

Thinking about ways to handle this... Just erroring if a new pull is bad? Enforcing it at the api level and only every allowing 1 exact type of pull request?

nats-io / nats-architecture-and-design