client: re-define the meaning of poke()

mdavidsaver commented 1 year ago

As a follow on to #39. This PR changes the meaning of poke() and the user facing hurryUp() method.

Actions which can poke() the client search process.

A new Channel is created (prior to #39)
A user call to Context::hurryUp()
A TCP name server reconnects
A new server is detected by UDP beacon

Currently, the action of poke() is to immediately expire searchTimer, sending search packets for any disconnected channels in the next bucket. There is a 30 second "hold-off" timer to keep this from happening too often, but this is only applied to beacon events.

This PR applies the hold-off timer to all causes, and changes the effect. A successful poke() will now switch searchTimer to a shorter interval (1 second -> 0.2 second) for nBuckets periods, and omit incrementing the search delay (Channel::nSearch). Effectively, this gives a 5x speed up of the search process for one revolution of the search bucket wheel (6 seconds) before returning to the normal period for the remaining 24 seconds of the hold-off period. Though it will be an effective slow down for very recently created channels.

My thinking is that this will allow for (hopefully) occasional speedups of the search process, allowing for reduction of (re)connect latency. However, his will increase both the possible peak and average UDP traffic levels if eg. beacons from a flapping server frequently trigger new server detection. I think that this will be bounded to less than twice the normal levels on average as 24 of 30 buckets would be searched twice within one hold-off period, and the remaining 6 once.

AppVeyorBot commented 1 year ago

:white_check_mark: Build pvxs 1.0.876 completed (commit https://github.com/mdavidsaver/pvxs/commit/38c680ded5 by @mdavidsaver)

mdavidsaver commented 1 year ago

@thomasives If you have time, I'd be interested to have your thoughts on this idea.

thomasives commented 1 year ago

This looks like a good change to me. I think this will ensure there aren't any more pathological cases where we end up flooding the network with UDP packets.

Though it will be an effective slow down for very recently created channels.

I don't understand this comment, I don't think you are changing how is done initialSearchS is working, so I don't see how this would affect recently created channels negatively.

Just a thought that isn't directly about this PR

I think there is a small issue that remains even with this PR. In order to avoid UDP bursts, I think we should be aiming for a uniform distribution of channels between all the buckets. I doubt we will get a uniform distribution for a typical application currently.

I suspect that a typical application will be creating channels in bursts which will result in them all being placed in the same bucket. With how the channels are being rescheduled now, I think that they will end up being rescheduled to the same bucket, assuming they aren't found (e.g. the IOC is offline). So, in typical situations I think we would expect channels to remain clumped in the search ring. (It would be interesting to try and measure this to see if this analysis is correct.) This obviously will result in the bursts of UDP activity which we want to avoid. I assume something similar will happen when an IOC disappears, all the channels from that IOC will presumably end up in the same bucket (I haven't checked the code for this though) and will thus remain clumped.

I think you have mentioned something about placing channels in the initial bucket randomly. This would fix the issue of clumping, although it would be a shame for an unlucky channel to wait up to 30 seconds before it is included in a search request for the second time. Perhaps you could randomise its placement to between the first 3-4 buckets and then randomise ninc somehow when you reschedule it. If we are careful about how we do the randomisation we can probably get a steady state with a uniform distribution of channels throughout the search ring.

mdavidsaver commented 1 year ago

... In order to avoid UDP bursts, I think we should be aiming for a uniform distribution of channels between all the buckets ...

I do have some code which attempts to "level" adjacent buckets in the long run. This logic is largely a guess on my part. I don't have any measurements to justify it.

https://github.com/mdavidsaver/pvxs/blob/dd2f076b4aa6dc63fb91980eda711eabee910696/src/client.cpp#L1082-L1089

I'm using the number of channels in a bucket because I don't have a ready way to track this as number of UDP packets. (thus the magic number 100)

mdavidsaver / pvxs

client: re-define the meaning of poke() #41

Just a thought that isn't directly about this PR