ruby-concurrency / concurrent-ruby

Modern concurrency tools including agents, futures, promises, thread pools, supervisors, and more. Inspired by Erlang, Clojure, Scala, Go, Java, JavaScript, and classic concurrency patterns.
https://ruby-concurrency.github.io/concurrent-ruby/
Other
5.68k stars 418 forks source link

Remove timeout from TimerTask #926

Closed jacobat closed 2 years ago

jacobat commented 2 years ago

The timeout in TimerTask is not ensuring tasks are not allowed to continue processing after the timeout has passed. This can lead to threads leaking since TimerTask will try to run the provided task again before the previous execution has completed. Yet TimerTask will not allow the task to run in parallel so more and more worker threads will be queued up waiting to be scheduled for executing the task.

To illustrate, imagine running a TimerTask with an execution interval of 1 and a timeout interval of 1, with the task itself running for 4 seconds.

Time    0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
Worker1   s>  t  -  -  <     >  t  .  .  .  .  .  -  -  -  <
Worker2       s  >  t  .  -  -  -  <     >  t  .  .  .  .  .
Worker3             s  >  t  .  .  .  -  -  -  <     >  t  .
Worker4                         s  >  t  .  .  .  .  .  .  .
Worker5                                     s  >  t  .  .  .

At t=1 worker 1 is spawned(s) and scheduled(>) and will start executing the task. It will timeout(t) after 1 second and cause the spawning of worker 2. Worker 2 will then wait 1 second to be scheduled and then another second to timeout causing the spawn of worker 3 at t=4.

Worker 3 is then scheduled to start at t=5 and will timeout at t=6. At this point worker 1 has completed it's previous task so the task queued by worker 3 will go to worker 1 to be scheduled for t=7. At t=8 worker 1 will timeout and since worker 2 is currently executing(-) and worker 3 is current waiting(.) for worker 2 to completed worker 4 will be spawned.

This patterns will continue to repeat with new workers/threads spawned every 4 seconds.

Closes #639.

sl0thentr0py commented 2 years ago

Is there an update on this? We would like to use TimerTask in Sentry but the thread leaking is a serious issue that needs to be fixed because it would likely affect our downstream users.

chrisseaton commented 2 years ago

@sl0thentr0py working on this today.

chrisseaton commented 2 years ago

@sl0thentr0py I don't quite get this - the proposed fix just removes the functionality for timeout, because it's not useful. It doesn't stop you leaking threads if you want to schedule every second something that takes an hour to run. You're still going to have tons of threads in that case. So is this really what you wanted?

sl0thentr0py commented 2 years ago

@chrisseaton We went with our own simple threading solution for now anyway, so this isn't really blocking for us anymore.

chrisseaton commented 2 years ago

Note that I'll follow up this PR with one that restores the API, and so library compatibility, but warns.

jacobat commented 2 years ago

@sl0thentr0py I don't quite get this - the proposed fix just removes the functionality for timeout, because it's not useful. It doesn't stop you leaking threads if you want to schedule every second something that takes an hour to run. You're still going to have tons of threads in that case. So is this really what you wanted?

It's been a while, so my memory may be deceiving me. But I believe this stops the thread leaking since it's the timeout that causes the thread leakage. If there is no timeout, the task won't be rescheduled (due to a timeout) until after the previous invocation of the task has completed and so you'll only ever have one worker thread.

mhenrixon commented 2 years ago

Shame, I just had to reimplement the entire old timer task because the current one doesn't timeout: https://github.com/mhenrixon/sidekiq-unique-jobs/pull/702

I already suggested a fix for the issue with the timeout leaking threads and proposed a suggestion by linking to my implementation of the timer task (it used to be really easy with the diff because it was just a few methods).

See the complaint here: https://github.com/mhenrixon/sidekiq-unique-jobs/pull/701

eregon commented 2 years ago

As Petr mentioned in https://github.com/ruby-concurrency/concurrent-ruby/pull/749#issuecomment-428701050, Using Thread#raise (or Thread#kill) is unsafe and brings more problems than solutions. For instance ensure blocks might be arbitrarily interrupted, and that can cause to leak a connection, a file descriptor, to keep a Mutex locked when it should not (and then cause deadlocks), to corrupt data, to leak the response of a request to another (as explained in the blog post), etc. So while some users of TimerTask using a timeout may not have experienced an issue yet, there will be an issue sooner or later if the task uses any ensure block, or any kind of state which would be invalid if the task is interrupted arbitrarily in the middle, which is very common.

mhenrixon commented 2 years ago

@eregon any suggestions for other ways of doing this? The execution has to timeout one way or another.

The problem with the way the timer task works after the removal of the timeout is that it obviously leads to resources being hogged according to user report (see link to issue in previous comment).

I read the blog post now, had not seen that one before.

chrisseaton commented 2 years ago

Can you do co-operative timeout? Ideally your code would check if it should stop frequently. Then it can stop in an orderly way. This is how major VMs like the JVM are engineered - they stop at well-defined points only.

mhenrixon commented 2 years ago

Can you do co-operative timeout? Ideally your code would check if it should stop frequently. Then it can stop in an orderly way. This is how major VMs like the JVM are engineered - they stop at well-defined points only.

Thanks @sl0thentr0py that's exactly where I was heading. The timeouts could explain some other random issues that people have been experiencing I suppose.

Never had to come up with such a solution before. Challenge: accepted