Open wkloucek opened 5 months ago
You can observe the message backlog for the "dcfs" consumer group when trying what I describe in the test section of https://github.com/owncloud/ocis-charts/pull/538
Yes. We considered this and it should not be very hard to implement. But I see some downsides with this approach.
That being said I see one part where we could extract some events to a different queue: SSEs.
The clientlog and userlog service (maybe more in the future) both send events called SendSSE
. These events are only interesting for the sse
service, therefore could be sent in a separate queue. However I doubt that this few events will have a significant impact on ocis performance.
I am also a bit concerned that using multiple queues will bring even more complexity to already complex ocis configuration.
@wkloucek @kobergj Valid points.
I would suggest to work "problem oriented".
Where do we already see issues in the current implementation? How can we identify them and would splitting up the queues make any sense.
There is a new upcoming feature #8881 which will make heavy use of the event system. From the top of my head, i see no real difference in the consumer groups between SSE, Activity, Auditing, Userlog, Clientlog ... They seem to be interested in 90% of all events.
How would a helpful "split" would look like?
I would suggest to work "problem oriented".
I partly agree that this is about a problem that we may not yet have or not yet realized that we already have it. But I think we'll have this problem for sure when targeting instances with multi-thousand concurrent users.
I already once run into problems of what happens when you have a slow consumer (the dcfs client is set to concurrency of 1 in the oCIS product default). You can find a reproducer in https://github.com/owncloud/ocis/issues/8949#issuecomment-2074870615
From the top of my head, i see no real difference in the consumer groups between SSE, Activity, Auditing, Userlog, Clientlog ... They seem to be interested in 90% of all events.
But why does the dcfs need to listen on all of this if it only needs to know which upload can be finished? Even if acking on the client side is fast, not sending the events to the consumer in the first place is faster and more efficient.
But why does the dcfs need to listen on all of this if it only needs to know which upload can be finished? Even if acking on the client side is fast, not sending the events to the consumer in the first place is faster and more efficient.
maybe we could think about splitting out the "filesystem" consumers like antivirus, dcfs, postprocessing
.
That could split apart the "Filesystem Events" from the "General Events".
maybe we could think about splitting out the "filesystem" consumers like antivirus, dcfs, postprocessing.
That could split apart the "Filesystem Events" from the "General Events".
This would be an approach. But if we split away "Filesystem Events" (which probably includes sharing), there will be nothing left for the "General Events" queue. Also it is unclear in which queue space related events go?
But why does the dcfs need to listen on all of this if it only needs to know which upload can be finished?
dcfs
needs not many events to work, that is true. But splitting these events in multiple queues will force other consumer (e.g. postprocessing) to listen to multiple different event queues. Also configuration for dcfs
will increase in complexity as it then needs to push its events to a different queue than it is receiving from.
But splitting these events in multiple queues will force other consumer (e.g. postprocessing) to listen to multiple different event queues.
Isn't that something we can avoid by using subjects? https://docs.nats.io/nats-concepts/jetstream/streams#subjects
@kobergj @butonic @wkloucek I want to get this moving again.
We need to make a decision.
My opinion is still the same as before:
That being said, if we want to go this way we should start by splitting "file system events" into a separate queue. But we need to decide which events are "file system events". What about ShareCreated
for example?
Is your feature request related to a problem? Please describe.
Currently we have a single stream for events called "main-queue". We also use a single subject called "main-queue".
On this stream we have many consumer groups:
Every consumer group receives all events because we have a single stream with a single subject.
Consumers that do some heavy work based on events may fall behind when a lot of events are generated. What happens to those consumers can be read here: https://docs.nats.io/running-a-nats-service/nats_admin/slow_consumers
Describe the solution you'd like
Have separate streams or subjects based on the kind / audience of events.
For example:
This would reduce event pressure on consumers that do client side filtering of events right now. It probably would reduce load on NATS itself since events need to be distributed to fewer consumers.
Describe alternatives you've considered
none
Additional context