owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.41k stars 184 forks source link

Nats: Make MaxAckPending and MaxAckWaitTime configurable #10393

Open kobergj opened 1 month ago

kobergj commented 1 month ago

When putting heavy load to nats receivers the nats service will possibly redeliver events and therefore duplicate them. This is because we do not configure it correctly.

When pulling events from the queue the expected behaviour is the following: The event is pulled from the queue and delivered into a channel. When consumed from this channel the event will be acknowledged on the nats server. On high load this can cause problems. Sine MaxAckWaitTime is 30s by default, the nats server will redeliver the event after 30s. Leading to two identical events waiting to be picked up from the channel. Since MaxAckPending is set to 1000 by default, this can lead to up to 1000 identical events waiting for being processed.

We already introduced worker groups to be able to handle multiple events at the same time. We now need to make MaxAckWaitTime and MaxAckPending configurable, so they can configured individually.

Acceptance Criteria:

wkloucek commented 3 weeks ago

should we already add those settings to our oCIS chart with NATS deployment example? https://github.com/owncloud/ocis-charts/blob/main/deployments/ocis-nats/streams/ocis.yaml exposes all the stream settings already.

Our Kubernetes deployments already could benefit from this a lot!?

kobergj commented 2 weeks ago

Our Kubernetes deployments already could benefit from this a lot!?

Yes it should. If possible please set them.

wkloucek commented 2 weeks ago

@kobergj do you have proposals for MaxAckWaitTime and MaxAckPending that we could set on all / a subset of streams?

Just to interweave our ticket clusters a little bit:

https://github.com/owncloud/ocis/issues/8949 would maybe take out some pressure on high message count situations!? It also would allow more fine granular redeliver settings. Eg. the postprocessing messages could need a very high MaxAckWaitTime if we had a long running postprocessing step while SSE related events need to be dropped anyways if they are not delivered within < 10 seconds or so!?

micbar commented 2 weeks ago

g. the postprocessing messages could need a very high MaxAckWaitTime if we had a long running postprocessing step while SSE related events need to be dropped anyways if they are not delivered within < 10 seconds or so!?

That sounds reasonable. The requirements for the different use cases could be fulfilled better in separate streams.

kobergj commented 2 weeks ago

https://github.com/owncloud/ocis/issues/8949 would maybe take out some pressure on high message count situations!? It also would allow more fine granular redeliver settings

Not sure about this. If postprocessing service is busy it doesn't matter how many queues it talks to. Maybe we should use regexes to subscribe only for specific events?

. Eg. the postprocessing messages could need a very high MaxAckWaitTime if we had a long running postprocessing step while SSE related events need to be dropped anyways if they are not delivered within < 10 seconds or so!?

No. This is unrelated to the time a step needs to complete. As soon as the message is pulled from the channel, it will be acked on the nats server. The problem is that pending messages get redelivered, therefore multiple (equal) events are waiting on the same channel to be picked (and acked).

Recommendation is to keep MaxAckWaitTime to a high number (30s+) and decrease MaxAckPending to a very low value (3-)

jvillafanez commented 1 week ago

For the short term, I think we can make those options configurable to have a better control of the problem, which could also help to debug the problem by reducing the wait time so the issue happens more often, but I don't think it will fix the issue. I mean, 30 secs should be enough for the postprocessing service (or any other consumer) to get the event and send the ACK, so if that's a problem we could consider that 5 minutes might not enough and we need a greater value.

The idea I have is to include an ever-increasing counter for each event type. Basically, saying that "this is the event number 123 of the share_created event" for example. This counter must be part of the event information, either as an event property or as part of the event metadata.

Assuming each event type is generated in only one place, having a counter there to know how many events have been sent should be easy. This is basically the only responsibility the event publishers will have (in addition to other responsibilities that they could already have)

From the consumer's side, we need to keep track of the last event of that type we've processed. For example, if we've processed the "share_created" event number 123, we should expect the "share_created" event 124, so any "share_created" event with a counter lower than 124 (including 123) should be considered as processed and can be ignored. This is how we could deal with duplicated events.

There are some edge cases to consider for this solution though:

Basically, we'd need to include the following information in the event:


For nats, most of the stuff above is very likely taken care by nats in some way or another. The question is whether we're fetching the events one by one, so the only event being re-delivered is the last one (waiting in our queue to be ACK'ed). Basically, in case of delays, the expected sequence of events should be 1,2,3,3,3,3,3,4,5,6.... and not something like 1,2,3,3,4,5,3,3,3,6... or 1,2,3,4,5,3,4,5,6..... If it's the first case, keeping the last event in memory to compare to the one we're receiving should be enough to detect the duplication and skip it.

wkloucek commented 1 week ago

Another solution could be switching to the pull client instead of using the push client?

kobergj commented 1 week ago

Let me explain the problem again, I think there is still some misunderstanding.

The default value for MaxAckPending is 1000. That means 1000 events are waiting at a channel to be picked up. However there are more events already waiting in the queue. That means as soon as one event is picked from the channel, a new one is being added. When the consumer now takes too long to get through the events, nats will redeliver pending events after MaxAckWaitTime. This leads to even more events waiting at the channel to be picked up. Ultimately this leads to the consumer being unable to work on all events because they are redelivered to fast.

This is a simple configuration problem. With a low MaxAckPending value nats will not need to redeliver events because the consumer is fast enough to pull all events. MaxAckWaitTime is not really needed as the default is already sane, but we can still offer it.

Putting more work to the consumer will only make the problem worse. If we want to improve performance of the event consumers we should use regex filtering on the event names. Afaik that is done on nats side and therefore doesn't put pressure on our consumers.

jvillafanez commented 1 week ago

This is a simple configuration problem. With a low MaxAckPending value nats will not need to redeliver events because the consumer is fast enough to pull all events. MaxAckWaitTime is not really needed as the default is already sane, but we can still offer it.

I'm not sure... If we're processing events one by one, is there any benefit to set the MaxAckPending value higher than 1? It seems the risk of redelivering is higher because there will be more events not yet ack'ed that have been sent. Even if there is a networking benefit of sending a lot of events together at the same time, it doesn't seem a good deal if there will be problems with the redelivery of the messages.

I'm worried this will end up being a test in production in order to figure out the right values for the environment, and those values might not be good enough if the system is overloaded. I'm ok exposing that configuration for performance tuning, but I don't think wrong values should break the system.