tarantool / tarantool

Get your data in RAM. Get compute close to data. Enjoy the performance.
https://www.tarantool.io
Other
3.4k stars 380 forks source link

The connections are pinned to iproto threads leading to uneven load and zero requests balancing #5990

Open Gerold103 opened 3 years ago

Gerold103 commented 3 years ago

In the new feature box.cfg.iproto_threads the connections are pinned to the thread which managed to accept() them first. It might work fine for short-living connections and for small and fast requests. But leads to an obvious issue of the feature being hardly usable for long-living connections, and when some connections are much heavier than the others. Because it might and will happen on a long-living instance, that some threads are loaded much more than the others - the other threads can't take any part of their load since the connections are pinned.

That is the case at least for vshard. Bucket discovery might be a heavy operation when it is aggressive, and when people make it even more aggressive by tweaking timeouts. Also it is the case for the rebalancer. When it is started on a big memtx cluster, the buckets are being sent very aggressively. I know that on some installations the rebalancer might use CPU close to 100% when there is not much other work. The bucket sending happens in a single connection using multiple fibers, which makes it quite heavy and long (hours, or days if there were errors in the middle, or when there are sharded vinyl spaces).

Gerold103 commented 2 years ago

I already implemented similar thing on my other job. And here I was able to share the algorithm: https://github.com/Gerold103/tla/blob/master/TaskScheduler.tla. It is not exact algorithm I used, but the latter is based on this one.

Gerold103 commented 2 years ago

Instead of TLA+ now there is a C++ implementation: https://github.com/ubisoft/task-scheduler. Not precisely what is used for fair scheduling of sockets, but very-very similar.

Gerold103 commented 1 year ago

Another problem with the current implementation is revealed in the "iproto override" feature. It allows to set own handlers for IProto requests. In the present architecture this "table of overrides" has to be pushed to each IProto thread to maintain multiple copies of it.

Not only it means there are multiple copies of same data, but also while the override is in progress, there is undefined behaviour whether your next request will be overridden. If it hits a thread which got the override - it will be intercepted. Otherwise you were unlucky.

The worst case will be when TX thread updated half IProto threads with a new override, then TX hangs (due to long select, for example). Then some requests of this type will be overridden, some will not. Simply depending on which connection you will use.

Gerold103 commented 9 months ago

See here a way how not to pin sockets to any threads: https://github.com/Gerold103/serverbox/tree/develop. Also boost::asio doesn't pin the sockets, but its implementation is hard to read and understand IMO.

CuriousGeorgiy commented 5 months ago

Just a raw idea that came to me after looking at Dalton: Learned Partitioning for Distributed Data Streams: the request balancing part looks like a multi-armed bandit problem — maybe we could try to use some simple reinforcement learning approach like tabular Q-Learning. It is basically approximate dynamic programming, and should be relatively easy to implement.

Gerold103 commented 5 months ago

This isn't about just request balancing. The connections shouldn't be pinned to threads. If one thread has to read and write 1000 sockets and another one reads and writes 1 socket, then the first one can become the bottleneck even if you will balance its requests to other threads. Also you would have to bring the responses back to the original thread to send them.

The article I didn't read. Can only say that this isn't a DB-specific problem, so a VLDB paper or some other database talk are not only places to look for solution. One of the solutions I suggested already, the link is above. No matter what you choose, the solution will be incomplete as long as the sockets are pinned to threads in any way.