nats-io / nats-architecture-and-design

Architecture and Design Docs
Apache License 2.0
177 stars 20 forks source link

Get Message Enhancement (Direct API) #115

Closed aricart closed 1 year ago

aricart commented 1 year ago

Overview

Retrieve messages from a stream in a more efficient way than the get message API ($JS.API.STREAM.MSG.GET.<stream>)

allow_direct

Direct Get Api

When the property is enabled, clients can lookup messages using the direct JetStream API: $JS.API.DIRECT.GET.<stream> with a payload the same as the get message API:

  Seq     uint64 `json:"seq,omitempty"`
  LastFor string `json:"last_by_subj,omitempty"`
  NextFor string `json:"next_by_subj,omitempty"`

Headers constants

When getting a message directly, Nats specific headers are added:

Client Behavior

References

Clients and Tools

Other Tasks

Client authors please update with your progress. If you open issues in your own repositories as a result of this request, please link them to this one by pasting the issue URL in a comment or main issue description.

Example unit test

  1. Create a stream 'stream' with subjects 's1' and 's2', allow_direct set to true
  2. Publish 6 messages, 3 to each subject, alternating subjects. So publish to s1, s2, s1, s2, s1, s2

Then test and validate headers for expected subject and sequence. I checked stream also and that a timestamp was present.

Request Expected Subject Expected Sequence
seq = 1 s1 1
last_by_subj = s1 s1 5
last_by_subj = s2 s2 6
seq <= 0 , next_by_subj = s1 s1 1
seq <= 0 , next_by_subj = s2 s2 2
seq = 1 , next_by_subj = s1 s1 1
seq = 1 , next_by_subj = s2 s2 2
seq = 2 , next_by_subj = s1 s1 3
seq = 2 , next_by_subj = s2 s2 2
seq = 5 , next_by_subj = s1 s1 5
seq = 5 , next_by_subj = s2 s2 6
Regular Request Expected Status
{} bad request [10003]
{"seq":0} bad request [10003]
{"seq":9} no message found [10037]
{"last_by_subj":"not-a-subject"} no message found [10037]
{"next_by_subj":"not-a-subject"} no message found [10037]
{"seq":9,"next_by_subj":"s1"} no message found [10037]
{"seq":1,"next_by_subj":"not-a-subject"} no message found [10037]
Direct Request Expected Status
{} {408, 'Empty Request'}
{"seq":0} {408, 'Empty Request'}
{"seq":9} {404, 'Message Not Found'}
DIRECT.GET.stream.not-a-subject {404, 'Message Not Found'}
{"next_by_subj":"not-a-subject"} {404, 'Message Not Found'}
{"seq":9,"next_by_subj":"s1"} {404, 'Message Not Found'}
{"seq":1,"next_by_subj":"not-a-subject"} {404, 'Message Not Found'}
derekcollison commented 1 year ago

The value is not json encoded either, nor does it do any API pass through or accounting. So much more efficient.

aricart commented 1 year ago

Feature landed in the server!

ripienaar commented 1 year ago

While I opted not to add direct get as a feature in the CLI for getting messages I did add --allow-direct as a option when adding streams, but since its so niche its not something being prompted for.

Will report such a stream in s info though

derekcollison commented 1 year ago

ok for now, but once settled this will be the way for KV etc and possible individual message retrieval directly from a stream.

aricart commented 1 year ago

Note that subject has changed to $JS.API.DIRECT.GET.${stream}

aricart commented 1 year ago

Note that there's a next_by_subj option

kozlovic commented 1 year ago

@derekcollison @aricart From a client library perspective, should we return a "raw" NATS message with the incoming headers, or should we make it a "JS" message where users can access Sequence, Time, etc..?

If the latter, are the headers copied over as-is (including all the Nats-xxx ones)? I am guessing that the original message could have its own headers, so we can't simply strip the whole headers section...

scottf commented 1 year ago

Are we re-using the existing schema response, stream_msg_get_response, the same as the current get messages? That message object could be expanded to have a reply-to field which is where we pull the js meta data, or just have a js-meta data field.

derekcollison commented 1 year ago

These are delivered on INBOX (or reply) and are friendly to Request Muxing. They have headers that denote stream, subject, sequence and timestamp.

These are raw messages, not JSON encoded.

derekcollison commented 1 year ago

These also participate in a queue group so all members, optionally including mirrors, can participate.

kozlovic commented 1 year ago

@derekcollison But regarding my question: should the library return them as-is (that is a raw NATS message) or as a JetStream message? For instance in Go the normal GetMsg() returns a RawStreamMsg that looks like this:

type RawStreamMsg struct {
    Subject  string
    Sequence uint64
    Header   Header
    Data     []byte
    Time     time.Time
}

As I was asking if we return as a RawStreamMsg, do we need to suppress the Nats-xxx headers that were return by the "direct get" API response?

derekcollison commented 1 year ago

I think JetStream message. Apologies for the confusion. But these can not be ack'd and are not part of a consumer of course.

aricart commented 1 year ago

Likely they shouldn't be a JetStream message as the API shouldn't introduce ambiguity on how it was obtained. If the client provides a different "type" of msg, it should do so if it can. It is possible that in the current Go client these are the same as the Go client doesn't differentiate.

derekcollison commented 1 year ago

But it is a JetStream message, you got it out of a stream. I think client libs should present a consistent view upwards regardless of low level mechanism to retrieve, e.g. consumer vs stream get vs direct stream get.

kozlovic commented 1 year ago

But it is a JetStream message, you got it out of a stream

I would agree here. Of course, it can't be ack'ed (since it was not received through a subscription), but it is still a JetStream message.

aricart commented 1 year ago

in the case of Go, external functions like ack,etc are not bound to the type. In other languages they are, so typing it as a "JetStreamMsg" would also imply functions that are not applicable, thus the comment that languages should do what is appropriate.

aricart commented 1 year ago

when doing a last_by_subject you can now simply append this subject to the request for the message without any payload. This allows things like KV to be clamped for permissions on GET when using the Direct API. See https://github.com/nats-io/nats.deno/pull/341

aricart commented 1 year ago

Note that the original description of this issue omittedmirror_direct configuration property on the stream, which should also be available in order to enable mirror to use the direct get api

derekcollison commented 1 year ago

Right now it gets set to true if the origin stream has AllowDirect, just FYI.

kozlovic commented 1 year ago

@aricart We should update the description that the server does NOT set allow_direct to true when max_msgs_per_subject is > 0, this change in the server has been reverted here: https://github.com/nats-io/nats-server/pull/3441

aricart commented 1 year ago

Yes. I marked that on @bruth release worksheet

aricart commented 1 year ago

Removed initial comment where the initial server behaviour of setting allow_direct was set to true by the server by default

bruth commented 1 year ago

Stream docs were updated to document the AllowDirect and MirrorDirect options as of https://github.com/nats-io/nats.docs/pull/489.

bruth commented 1 year ago

Per ADR-31, this is implemented.