nats-io / nats-architecture-and-design

Architecture and Design Docs
Apache License 2.0
170 stars 20 forks source link

Headers on kv entries #278

Open cjohansen opened 1 week ago

cjohansen commented 1 week ago

Paraphrase

The question here is two-fold.

  1. Could (should) the KV api be expanded to allow putting headers as well the value with a key.
  2. Is it safe for a custom implementation to put headers on, or is there a technical reason why the headers space on the message should be reserved for internal future use.

Original

I have a question, hope this is the right place to ask.

Is it OK to set custom headers on kv entries? I'm working on a Clojure client (https://github.com/cjohansen/clj-nats/), and one of my goals is for it to work seemlessly with Clojure's immutable data structures. For streams I have implemented the client such that if you write message with a Clojure data structure in the message body, the client adds a content-type header to the message. When the message is read back, it can be automatically parsed as Clojure data when the relevant content-type is set on the message.

I've experimented with the same feature for kv entries, and it works, but I wanted to make sure that doing this is safe and will continue to work. It allows the client to work like so:

(kv/put conn "kv-bucket" "key1" {:name "Mr Bojangle"})

(kv/get conn "kv-bucket" "key1")
;;=> {:name "Mr Bojangle"}

So: Is it OK to add custom headers to kv entry messages?

ripienaar commented 1 week ago

We do not support headers in KV, initially we expected there would be a chance we'd use alternative backends so didnt extend things too much past whats generally supported.

Regarding types and KV we are working on this though. It's a WIP and we hope to have more formal write ups in the next few weeks after devs meet. Our current thinking is below:

First we will store structs/maps/hashes in multiple keys so:

{
  "name": "John",
  "surname":"Smith".
  "age": 20
}

Would become something like:

kv put USERS 123.name '"John"'
kv put USERS 123.surname '"Smith"'
kv put USERS 123.age 20

In other words we would JSON encode the values such that the string John is stored as "John" and the number 20 as 20. This way languages can go backward from there and map this to native types.

We choose JSON encoding because, well, we already support that everywhere and the few types it does support is therefore universal - number, string, bool, map.

This way we avoid schemas and language with more specific types can do like:

type User struct {
  Name string `nats:"name"`
  Surname string `nats:"surname"`
  Age uint `nats:"age"`
}

Here a go struct marked up with nats key names and the one of interest here is Age that is a unsigned int, on unmarshal from the KV values we would validate it is unsigned in the data and cast it to int.

The reason behind storing a multi key map as individual keys is so that on large maps - thousands of struct keys - we can do partial loads and partial writes.

As things stand today that would be hard but we did some work already in the upcoming 2.11 to faciltate that specifically about Batched requests and Multi Subject requests see here https://github.com/nats-io/nats-architecture-and-design/blob/main/adr/ADR-31.md#batched-requests

These facilities will let us realise huge maps, partial reads and writes and types. Related we are adding a few new native capabilities around counters and such.

So might be worth waiting a bit for these efforts to take shape before you design something similar on your side so we can be interoperable. I am curious if you think above design would be feasable for your use case and language?

c00kiemon5ter commented 1 week ago

First we will store structs/maps/hashes in multiple keys

this reminds me of gron (and --ungron)

I am guessing unpacking the payload into multiple keys would be done in one transaction. Along with that comes a small performance hit.

What happens with arrays/lists? gron creates "keys" with the index for each item. Will there be any special handling there?

The reason behind storing a multi key map as individual keys is so that on large maps - thousands of struct keys - we can do partial loads and partial writes.

This is interesting and could prove to be a valuable approach. Again, I am guessing that one could still send a complex struct to be stored, which would be unpacked and stored in separate keys. So, updating a "document/resource" is supported, but updating a sub-resource on its own is also supported.

Which reminds me of JSON-patch (with its issues).

With JSON-patch though, one can control what is being updated through the supported operations on (sub)resources. One could impose rules about which parts of a document can be update on their own and which have to be updated together, (etc) due to this intermediate layer; the JSON-patch language and its handlers.

Here, this type of policies are offloaded to the apps. The apps need to know these rules. If multiple apps are updating the same records, one would have to introduce a new layer to expose specific capabilities to these apps (to avoid replicating the policies across apps). I don't think this is bad, I am just noting (also for myself) to further think about it.

ripienaar commented 1 week ago

The writes will have to happen client side - so client has to do the gron like packing and then send it to the server one key at a time. Unfortunately we do not support transactions or batch writes yet, so not ideal but its a start

JSON-patch is pretty cool - but since the data in this case is essentially a KV key per struct value we cant think of the storage as a single document that we can patch against. In go I have some POC that tracks which keys get updates so that when you persist a struct back to the bucket it will only send what was changed and will do so using CAS method - but tbh its probably too difficul tto use and the failure scenarios are very bad in the absense of transactions so I wont be doing that.

All the update-together behaviours that I would want has quite bad failure modes since if you start failing updates after 5/10 writes you cant rewind the ones thats passed and now you have a half-write. So I dont want to overdo on complicating that till we get some kind of transactional write feature. This really is the crux of the problem in my mind with this being client side.

It could be that we land what we have and then with what we learnt work on bath write on the server and then revisit a better model for writing side of things - then things like patch behaviours make sense. Now I just dont think the dev experience will be great in the failure path

c00kiemon5ter commented 1 week ago

ah, sorry for my misunderstanding. I agree that without transaction semantics this will be problematic.

cjohansen commented 1 week ago

We do not support headers in KV, initially we expected there would be a chance we'd use alternative backends so didnt extend things too much past whats generally supported.

I guess you're talking about the client libraries? The observable implementation is just a stream where keys are subjects and messages are the last revision on each subject. I really like this, because it keeps KV close to the rest of NATS, and leaves the user with lots of room for extensions. For instance I already have a working version of client-side encode/decode of custom types based on message headers.

This leads me to ask: what's the definition of the KV? If it used a completely different backend I no longer see how it fits into NATS?


Regarding your suggested design, it seems to address a different issue: I don't have big values, and I'm not looking to type my data, I want to store custom data formats and use headers to know what's in the value. The suggested design also comes with some hairy issues as you pointed out, and I would like to add one more: It is no longer possible to get a consistent view of the whole value. Essentially the key now points to a mutable object in NATS, not a versioned value.

The reason behind storing a multi key map as individual keys is so that on large maps - thousands of struct keys - we can do partial loads and partial writes.

If you have this problem, I don't think a KV store is the right solution. At least I would not use one. I value atomic writes and reads.

In any case, it sounds like your suggested design would expand on the idea of a KV store as subjects and messages? Allowing the user to add custom headers will make the design highly extensible in userspace at no cost for the NATS server (well, it would probably tie it closer to the underlying messaging primitives) or complexity in design.

ripienaar commented 1 week ago

We do not support headers in KV, initially we expected there would be a chance we'd use alternative backends so didnt extend things too much past whats generally supported.

I guess you're talking about the client libraries? The observable implementation is just a stream where keys are subjects and messages are the last revision on each subject. I really like this, because it keeps KV close to the rest of NATS, and leaves the user with lots of room for extensions. For instance I already have a working version of client-side encode/decode of custom types based on message headers.

This leads me to ask: what's the definition of the KV? If it used a completely different backend I no longer see how it fits into NATS?

Our KV is a abstraction around a stream, the purpose is to provide a small, more approachable and more widely known API than streams.

This means that in every way the KV API is about removing features from streams, framing existing features in different terms and making many assumptions about the structure of a "KV Stream" to create a KV like abstraction.

So yes, while it is just a stream, if it was EXACTLY a stream, we wouldnt bother with KV we would just have streams. People like KV because its small surface area and relatively small feature set.

So choices around not supporting headers is about that, its about what we should support towards that goal and not about what we could support.

Regarding your suggested design, it seems to address a different issue: I don't have big values, and I'm not looking to type my data, I want to store custom data formats and use headers to know what's in the value. The suggested design also comes with some hairy issues as you pointed out, and I would like to add one more: It is no longer possible to get a consistent view of the whole value. Essentially the key now points to a mutable object in NATS, not a versioned value.

You may not have such values but we are basing these needs on real world use cases by large paying customers. Multi thousand key structs is unfortunately a reality for us.

The changes for batch reads are point in time atomic even when paged. We designed them that way for this purpose to address the concern you raise.

The reason behind storing a multi key map as individual keys is so that on large maps - thousands of struct keys - we can do partial loads and partial writes.

If you have this problem, I don't think a KV store is the right solution. At least I would not use one. I value atomic writes and reads.

As mentioned, we have point in time atomic multi key reads in the new direct get APIs

In any case, it sounds like your suggested design would expand on the idea of a KV store as subjects and messages? Allowing the user to add custom headers will make the design highly extensible in userspace at no cost for the NATS server (well, it would probably tie it closer to the underlying messaging primitives) or complexity in design.

No, not at all, it would simply mean we represent the data in a bucket as it had higher order conventions and type hints. That's all, we still would not surface the concept of messages and subjects. Just like we do not now for KV.

cjohansen commented 1 week ago

Our KV is a abstraction around a stream, the purpose is to provide a small, more approachable and more widely known API than streams.

This means that in every way the KV API is about removing features from streams, framing existing features in different terms and making many assumptions about the structure of a "KV Stream" to create a KV like abstraction.

I understand the goal to remove details from the stream API to consumers of KV stores. I think the big value you deliver here is not necessarily in hiding the fact that KVs are streams, but in providing KV semantics, and providing tools that makes KV a first-class usage pattern.

I don't think that offering more stream functionality as a an extension point for advanced users or client library authors interferes with first class KV semantics. Making headers accessible means KVs can have metadata, which I think could be very useful for many use cases. A KV of a key and a value that is a series of bytes requires out-of-band information to use. A KV of a key, bytes and metadata can be fully understood by any client.

The changes for batch reads are point in time atomic even when paged. We designed them that way for this purpose to address the concern you raise.

Oh good!

No, not at all, it would simply mean we represent the data in a bucket as it had higher order conventions and type hints. That's all, we still would not surface the concept of messages and subjects. Just like we do not now for KV.

I didn't really mean that you would surface these details, just that those are still very much the building blocks (as opposed to some "other backend").