nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.97k stars 1.41k forks source link

feature: multi subject publishing, data uploaded once #1306

Open robinbraemer opened 4 years ago

robinbraemer commented 4 years ago

Feature Requests

I think it would improve usability, efficiency and latency when we can publish bytes of data to multiple subjects in one request and the nats server spreads out the message to all targeted subject, instead of uploading the data to each single subject.

Use Case:

obvious

Proposed Change:

No changes but additions.

Who Benefits From The Change(s)?

Alternative Approaches

aricart commented 4 years ago

Wondering why the client couldn't subscribe to the correct subject to receive the data it needed. Typically subject-based applications don't replicate data into different subjects. An example of why this would be useful would be helpful.

nuharaf commented 4 years ago

Say I have a common subject structure , like : notif. If I want to send message to particular client, I publish to notif.001 If I want to publish to all client, I can design the client to subscribe , say "broadcast" subject But If I want to publish to notif.001 and notif.003 but not to notif.002 I have to publish twice

alternative: ask client 001 and client 003 to subscribe to subject that only relevant to them,such as notif001003 , but this grouping need to be known beforehand , rather than publisher can decide dynamically at runtime

robinbraemer commented 4 years ago

Say I have a common subject structure , like : notif. If I want to send message to particular client, I publish to notif.001 If I want to publish to all client, I can design the client to subscribe , say "broadcast" subject But If I want to publish to notif.001 and notif.003 but not to notif.002 I have to publish twice

alternative: ask client 001 and client 003 to subscribe to subject that only relevant to them,such as notif001003 , but this grouping need to be known beforehand , rather than publisher can decide dynamically at runtime

Thats the exact use case I face. @aricart

ripienaar commented 4 years ago

How big are your messages? I very often publish to 50k or more subjects and it’s fine. Takes very little time or resources. My pattern is very similar to yours.

What also sometimes make sense is to send messages to nodes where they decide to ignore the message - I don’t know your use case but for me this works very well as my system lets me attach a filter

robinbraemer commented 4 years ago

My messages are very little, about 1k and in my system the nodes also ignore the messages where the target is not this node.

ripienaar commented 4 years ago

So unless you are over large latency or 100s of thousands of messages it will be fine. Are you experiencing issues or just suggesting an improvement? In my system I connect a publisher connection and receivers - publishing takes less than a second for 50 000 messages

I agree it’s a good feature to add but realistically we won’t be changing the protocol for quite a while to enable this

robinbraemer commented 4 years ago

I just suggested an improvement/feature. :)

derekcollison commented 4 years ago

Is the membership for notifications totally dynamic, could be an N from a set of M or is their a pattern?

jkralik commented 3 years ago

We need a similar feature for our cloud - that we need to have somehow grouping subjects where will be sent a message. Something like: image

Via nats API, we want to define subjects where will be messages distributed - (Add/Remove destination via nats client). By default, it is the same as is published subject.

@derekcollison What do you think?

derekcollison commented 3 years ago

We have account subject mappings which allow traffic shaping that could be adjusted to allow sending to multiple subjects.

mappings = {
    foo: [ { dest: bar, weight: 40% }, { destination: baz, weight: 20% } ]
}

So in the above we send to foo which will be sent to bar 40% and baz 20%. Currently these need to not add up to > 100%. It can be less to introduce loss for chaos monkey style testing.

We could allow something like this.

mappings = {
    foo: [ { dest: bar, weight: 100% }, { destination: baz, weight: 100% } ]
}

or more simply..

mappings = {
    foo: [ bar, baz]
}

Mappings are changeable with server reload or JWT updates, but probably do not expect alot of changes etc. Would this possibly work?

ondrejtomcik commented 3 years ago

Hello @derekcollison

I would like to explain bit more what we are doing with @jkralik and what challenges we're currently facing. We are building https://github.com/plgd-dev/cloud open-source IoT system using NATS as a messaging system. Our subjects are organized as follows:

Let's assume we have a deployment with few users while each of them have many devices (10000). If he is interested in websocket/grpc stream notifications from all his devices, we check in our authorization service which devices belongs to this user and start 10000 subscriptions (as they are organized per deviceID). Do you see an issue with this approach - wasting of resources? Is it worth to improve it?

We think it would make sense to organize subjects also per users ids. But there is a another overhead linked to doing the publish n+1. If one device is shared with other 1000 users, service would have to publish data to events.devices.{deviceID} as well as to events.users.{userId-1..1000}.devices.{deviceId}. Do you agree? Is it even good to organize subjects in such a way?

What could solve this n+1 publish overhead is automated mapping of subjects in NATS server based on the subscription. That means, subscriber has it's JWT token which we have from the grpc / websocket connection (northbound client subscription). Based on the value from the token (e.g. sub claim), NATS Server could create a mapping from events.devices.{deviceId-1..10000}.> to events.users.{userId}.>}. If the subscriber stops subscription, mapping could be removed.

But still, the question is if this optimization makes sense, if our thoughts are going the right direction.

Thank you

(we need to keep events.{deviceID} subjects for southbound systems donig data-mining, whic hare not aware of users).

derekcollison commented 3 years ago

There is a bit of an art form to designing subject ontologies. You should, imo, design the system to publish events once, but I would need to dig in a bit more to offer any guidance.

FYI NATS supports Websockets directly, so no need for websocket/grpc, just use NATS ;)

derekcollison commented 3 years ago

You can control permissions to which users can access which subjects etc, you could possibly link these to a cross list of subjects that a user has access to.

Meaning I would focus on the interest graph that is represented by each user when they log in, the set of subscriptions as you mention above.

ondrejtomcik commented 3 years ago

@derekcollison sure, NATS provide nice features, but internal messaging design - contract and subject organization shouldn’t affect in any case public API evolution driven by the business usecases. So for me this tight coupling between north and southbound interfaces is not very acceptable.

Agree that the system should be designed in a way we have events only once in the system - in one subject. That follows approach of modeling events around entity they represent. Is this the right approach? Or should you model subjects around its subscribers?

How expensive is the subscription @derekcollison ? If you have one client, how much does it matter if we subscribe to 1 or 1000 subjects?

ondrejtomcik commented 3 years ago

You can control permissions to which users can access which subjects etc, you could possibly link these to a cross list of subjects that a user has access to.

But which user has access to which device is dynamic. This can change during subscription. So we need to inform either our publishing service to start publish the data to another user topic (the same event) or reconfigure nats mapping dynamically.

Example, I am subscribed to notifications of all of my devices. But another user just shared his own device with me. This information is published to another topic on which our publisher would be listening and would start to publish data from this new device to that subject. Or reconfigure NATS to route it also to different subject. Dynamically.

But imho, it shouldn’t be organized around users at all.

derekcollison commented 3 years ago

Subscriptions are lightweight, so not an issue from that standpoint.

derekcollison commented 3 years ago

I would need quite a bit more information to form a complete opinion and suggestion on how to architect.

hspaay commented 1 year ago

@derekcollison as discussed on slack, I'd just like to add my use-case here. You proposed a solution above on Sep 29, 2021 that would be very useful by allowing mapping to multiple subjects. In this case static server config would be all that is needed.

The use-case is that messages from IoT devices are grouped and users subscribe to the group instead of individual devices. Users don't know ahead of time what devices are in their group. Groups and memberships are managed by an administrator outside of nats and converted in nats configuration.

When Iot Devices publish their events, these events are mapped to a group subject based for the groups they are a member of: For example. 3 devices and 2 groups:

Event mapping:

Users are allowed to subscribe to the group if they have group permissions. This should also work with streams where a stream would be for a group subject. Stream group1 would have subject "group1.>".

Alternatively, an even better approach would be to allow streams to overlap subscription. I don't quite understand why that is currently not allowed but it would be handy to define streams that can contain the same subject. In this approach a stream would be the group and the stream subscriptions are the devices that are a member of the group.

derekcollison commented 1 year ago

You are referring to cumulative weighted mappings above 100% correct?

hspaay commented 1 year ago

Yes

derekcollison commented 1 year ago

ok will see if we can do something for 2.10. We still have a bit of work to do for it already for existing customers and we are behind schedule as well.

hspaay commented 1 year ago

I very much appreciate it!

jnmoyne commented 1 year ago

You can already do something like this in 2.10 by using the new stream sourcing features that it has. You can for example have one stream that gets all the messages from the devices and then you create a stream per group (that sources from that initial stream). You can then easily control access to each of those group streams.

hspaay commented 1 year ago

@jnmoyne that is very interesting and sounds like it should work. I'll give it a try.

hspaay commented 1 year ago

@jnmoyne When creating a stream with multiple StreamSource on the same source stream and different FilterSubjects, the second StreamSource is ignored.

When testing this configuration:

Then only events.device1.temperature is received. If I swap the order then only device2 events are received. Looks like the source stream ('events') cannot overlap. This is with nats-server-2.9.18 and nats-go-1.27. You did mention 2.10 however, does that mean this only works for nats-server-2.10?

ps: I feel like I'm hijacking this thread. Should this be a separate issue?


edit: I tried with the latest 'main' branch but the same behavior occurs. No error is reported though. It looks like it is possible to have a stream source multiple subjects from the same stream, or have a stream subscribe to overlapping subjects, or have a single subject map to multiple different subjects (yet).

jnmoyne commented 1 year ago

Yes you need 2.10 for this to work (ie try with the current top of the dev branch). You also need to use the 2.10 branch of the nats.go client library.

hspaay commented 1 year ago

@jnmoyne @derekcollison It works. Totally awesome! Thank you both so much for your help. I'll do my dev on these branches until 2.10 is released.