torrust / torrust-tracker

A modern and feature-rich (private) BitTorrent tracker.
https://torrust.com
GNU Affero General Public License v3.0
357 stars 40 forks source link

UDP server: alternative implementation to avoid aborting too many requests #918

Closed josecelano closed 3 months ago

josecelano commented 3 months ago

I've udpated the demo, including the tracker:

udp://tracker.torrust-demo.com:6969/announce

@da2ce7 added a new log in this PR:

tracing::warn!(target: UDP_TRACKER_LOG_TARGET, local_addr, "Udp::run_udp_server::loop aborting request: (no finished tasks)");

Context

When the tracker receives a new UDP request:

  1. It spawns a new task to handle the requests.
  2. It gets the abort handle for that task.
  3. It tries to insert that handle in the active requests buffer where only 50 concurrent requests are allowed.
  4. If the buffer is full, it tries to remove finished tasks.
  5. If all the tasks are unfinished tasks, it removes the oldest task.

That's the point where we log the new warning:

In the demo environment, we have aprox 42 tasks aborted in 2 minutes. It means a log entry like this:

tracker  | 2024-06-26T10:08:08.933194Z  WARN UDP TRACKER: Udp::run_udp_server::loop aborting request: (no finished tasks) local_addr="udp://0.0.0.0:6969"

Of course, it's not a regular number.

Problem

I have described the problem in:

Proposal 1: New buffer for request pending to activate

I have also described a proposal in:

The basic idea is to add a new buffer for requests that can't be handled because we already have 50 active requests. I'm copying my comment from the PR:

When we receive a new request, we immediately spawn a new task to handle it. However, we limit the number of active requests to 50. If we are already handling 50 requests and a new request comes in, we spawn a new task and remove the oldest tasks in the active requests buffer. Would not it make sense not to spawn new tasks if the active request buffer is already full?

We could have a "pending to activate requests" buffer in front of the active reqs buffer. When tasks in the active buffer finish we get more requests from the first requests buffer and spwan a new task to handle them.

Pros:

  • We don't waste more resources spawning tasks that will not be executed immediately.
  • We don't need to remove active requests (we don't need to make a place for new requests if the active reqs buffer is complete).

Cons:

  • We indirectly increase the timeout. This could be a good or a bad thing. It would be good if the timeout was too short, and bad if it was too long.

I think we need to add an explicit timeout to the time spent processing the single requests (this is needed anyway) and we also need to remove aborted tasks for the active request buffer periodically.

I have to think deeper about it.

Proposal 2: Increase the number of active requests

We could also make it a config option.

I think we need to guarantee that we handle the requests we accept on time and we reject the ones we can not handle. In the current solution with high load, we can abort a request that has not had enough time to be processed even if it was the only task on the server.

Proposal 3: Abort the new request

If we receive a new request and we can not handle it because we are already handling 50 requests, we abort the new request.

New request will be rejected, and the server can start accepting new requests when other previous tasks finish.

This is the simplest solution. In fact, we can reject the new request even before inserting it in the active requests buffer.

Even if we implement solution 1, at some point, we have to implement something like this because we can not grow the first bugger indefinitely.

This solution requires cleaning finished tasks anyway.

  1. If the buffer is full we attempt to clean finished tasks.
  2. If none of the tasks are finished we ignore the new requests.
  3. If some tasks were finished, we spawn the new tasks and insert the new abort handle.

Extra

I've added a comment on the code here.

Conclusion

josecelano commented 3 months ago

Hi @da2ce7, these are the number of aborted tasks in the demo UDP server in 2 minutes: