torrust / torrust-tracker

A modern and feature-rich (private) BitTorrent tracker.
https://torrust.com
GNU Affero General Public License v3.0
357 stars 40 forks source link

Add comments to the UDP server #921

Closed josecelano closed 3 months ago

josecelano commented 3 months ago

After a dicussion with @da2ce7 I understood the reasons why he implemented the active requests buffer in the way it's implemented. This PR adds some clarifications.

My main concern was tasks starvation, which means if we always accept new requests (spawning new tasks) we could not give enough time to current active tasks to finish (a reasonable time).

@da2ce7 clarify that the yieldinf tokio::task::yield_now().await; should give the task enough time to finish.

My second concern was why we immediately spawn new tasks for the incomming requests instead of waiting until we have place in the buffer. @da2ce7 replied that we are going to run that's tasks anyway because we force a place for the new tasks.

josecelano commented 3 months ago

Starvation

Hi @da2ce7, I'm still not sure if we can't have a starvation issue with this implementation. It seems we are assuming that by just giving the task a second chance to finish with "yield", the task can finish, but in the end, resources are limited, and the tasks can have a lock. It could be waiting to get the lock to update the torrent repository. I have the impression that we can have scenarios where:

I have the impression that we should only reject new requests when the buffer is full. Maybe we can try to simulate that scenario with a shorter buffer.

LocalRb

@da2ce7 should not be possible to use a LocalRb instead of a StaticRb. If I'm not wrong, only one thread owns the ActiveRequests variable and it's not sent between threads.

josecelano commented 3 months ago

ACK a897de43dd22aa15b63800e2fd1ade8d5d424533

da2ce7 commented 3 months ago

@josecelano

@da2ce7 clarify that the yieldinf tokio::task::yield_now().await; should give the task enough time to finish.

In fact we can yield many times without aborting tasks if each yield leads to the completion of the old task.

My second concern was why we immediately spawn new tasks for the incomming requests instead of waiting until we have place in the buffer. @da2ce7 replied that we are going to run that's tasks anyway because we force a place for the new tasks.

We start the process of removing the tasks after starting the new task, then we do not even insert the new task if it is already finished after making space for it.

josecelano commented 3 months ago

@josecelano

@da2ce7 clarify that the yieldinf tokio::task::yield_now().await; should give the task enough time to finish.

In fact we can yield many times without aborting tasks if each yield leads to the completion of the old task.

My second concern was why we immediately spawn new tasks for the incomming requests instead of waiting until we have place in the buffer. @da2ce7 replied that we are going to run that's tasks anyway because we force a place for the new tasks.

We start the process of removing the tasks after starting the new task, then we do not even insert the new task if it is already finished after making space for it.

Hey @da2ce7 I think I was missing a critical point. Since we are only receiving new requests in a single thread there is no way to increase the number of request processor tasks more than 51 tasks. The socket buffer is doing the contention. New "raw" requests have to wait until a current active request finishes or aborts.

Regarding the other topic "having enough time to be completed" before the force the taste to abort, the scenario I'm worried is:

In that case we are just switching old tasks with new tasks. In that scenario I would prefer to abort current tasks if the time we have to wait is longer that a reasonable response time for the client. I mean, it does not make sense to wait 120 seconds because the client will make a new request, but it makes sense to wait 2 seconds and reject all new raw requests. In general I would prefer the system to reply to fewer clients slower rather than not replying at all to any client. Am I missing something?

UPDATE

Maybe we can store in the buffer not only the abort handle but also a timestamp for when the task was added. We only remove the task after a timeout. If all 50 tasks are under that timeout we continue yielding.

da2ce7 commented 3 months ago

@josecelano I think that you are mixing up getting tasks throughput and anti-dos.

For the server to start thrashing (i.e. to not process any requests), any particular request will have to be yielded to 50 times.

If somebody is making lots of very-difficult to fulfill requests and filling the buffer so it thrashes; short-lived tasks created by a non-dossing user can still slip in and will get yielded to.

josecelano commented 3 months ago

@josecelano I think that you are mixing up getting tasks throughput and anti-dos.

What's the difference? I mean, I'm trying to analyse what would happen on a high load scenario whether it's unintended or an attack. Only having a lot of announce requests that are computing to acces the torrent repository with other services.

For the server to start thrashing (i.e. to not process any requests), any particular request will have to be yielded to 50 times.

Sorry, I haven't realized that because I was not assuming we are popping the eldest item. Of course that's the expected behaviour of the ring buffer but the documentation does not make explicit for this method and since there are other methods I was assuming this "pop" is somehow random.

On the other hand, I don't know if 50 thread context switches is enough or not compare to what it takes to process a request. I guess you are assuming so.

If somebody is making lots of very-difficult to fulfill requests and filling the buffer so it thrashes; short-lived tasks created by a non-dossing user can still slip in and will get yielded to.

I suppose you assume the attacker uses a kind of heavy announce request that takes longer to process than the normal one. And I think you mean all tasks are going to have a chance to be executed and there is no way in this solution to avoid a bad actor. That's fine, I was not expecting to detect them and stop them. I was only trying to avoid wasting time just switching contexts.

If 50 thread context switches are enough to process a request in a reasonable response time in a server under pressure, then the solution is perfect. It's a kind of indirect timeout where we are not using a timestamp bit a context switches counter. I have the impression that the timeout should be bound to user expectations (what a reasonable response time is). This is more empirical, if we can't handle a request in 50 context switches then the system is under heavy load and we abort tasks. I think this would be the perfect solution if the reason for the server to underperform is because there are requests that are heavy to process.

But if the CPU load is high in general because there are a lot of requests in parallel that are waiting for a lock (for example) and 50 context switches is a relatively short time in comparison with a reasonable response time then it could lead to non replaying any client instead of sending responses with a longer response time.

Anyway, I think 🤔 I have to think deeper about it :-).

josecelano commented 3 months ago

ACK d1c2d15c7050ff75f04e026652b429077c8c2ace

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 62.50000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 77.29%. Comparing base (d4e3208) to head (d1c2d15).

Files Patch % Lines
src/servers/udp/server/request_buffer.rs 59.09% 9 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #921 +/- ## ======================================== Coverage 77.29% 77.29% ======================================== Files 183 183 Lines 9689 9689 ======================================== Hits 7489 7489 Misses 2200 2200 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.