reidmorrison / semantic_logger

Semantic Logger is a feature rich logging framework, and replacement for existing Ruby & Rails loggers.
https://logger.rocketjob.io/
Apache License 2.0
862 stars 121 forks source link

Lost Threads and Hanging Due to Lack of Mutexes #240

Closed eviljoel closed 1 year ago

eviljoel commented 1 year ago

Issue

None.

Changelog

- Fixed thread safety issues where appender threads are created and destroyed in
  multiple threads. Before this fix, Semantic Logger could lose track of
  appender threads. Consumers could also hang when attempting to flush or close
  the logger.

Description of changes

When adding and clearing appenders from multiple threads (as well as performing similar operations like SemanticLogger#close), two problems can occur:

This problem can be demonstrated with the following program:

require 'io/console'
require 'semantic_logger'
require 'stringio'

@keep_running = true

# Make sure we have something to 'reopen'.
SemanticLogger.add_appender(appender: SemanticLogger::Appender::IO.new(StringIO.new))

add_appender_threads = []
clear_appender_threads = []
reopen_appender_threads = []
8.times.each do | thread_index |
  add_appender_thread = Thread.new do
    while @keep_running
      STDERR.puts "Hello from add appender thread #{thread_index}.\r"
      SemanticLogger.add_appender(appender: SemanticLogger::Appender::IO.new(StringIO.new))
    end
  end
  add_appender_threads.append(add_appender_thread)

  clear_appender_thread = Thread.new do
    while @keep_running
      STDERR.puts "Hello from clear appenders thread #{thread_index}.\r"
      SemanticLogger.clear_appenders!
    end
  end
  clear_appender_threads.append(clear_appender_thread)

  reopen_appender_thread = Thread.new do
    while @keep_running
      STDERR.puts "Hello from reopen appender thread #{thread_index}.\r"
      SemanticLogger.reopen
    end
  end
  reopen_appender_threads.append(reopen_appender_thread)
end

puts 'Press any key to quit.'
STDIN.getch

@keep_running = false
SemanticLogger.close
add_appender_threads.each do | add_appender_thread |
  add_appender_thread.join
end
clear_appender_threads.each do | clear_appender_thread |
  clear_appender_thread.join
end
reopen_appender_threads.each do | reopen_appender_thread |
  reopen_appender_thread.join
end

After letting the program run for a minute or so, upon exit, an error like the following should occur:

Traceback (most recent call last):
        4: from noMutexHang2.rb:43:in `<main>'
        3: from /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/semantic_logger.rb:203:in `close'
        2: from /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:99:in `close'
        1: from /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:204:in `submit_request'
/home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:204:in `pop': No live threads left. Deadlock? (fatal)
10 threads, 10 sleeps current:0x00007f129d810a00 main thread:0x00007f12a6a53000
* #<Thread:0x00007f12a6bb6d58 sleep_forever>
   rb_thread_t:0x00007f12a6a53000 native:0x00007f12a6eae180 int:0
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:204:in `pop'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:204:in `submit_request'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:99:in `close'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/semantic_logger.rb:203:in `close'
   noMutexHang2.rb:43:in `<main>'
.
.
.
* #<Thread:0x00007f12a1a6f788 noMutexHang2.rb:22 sleep_forever>
   rb_thread_t:0x00007f12a1bb1f00 native:0x00007f129abf5700 int:0
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:204:in `pop'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:204:in `submit_request'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:99:in `close'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/semantic_logger.rb:184:in `clear_appenders!'
   noMutexHang2.rb:25:in `block (2 levels) in <main>'
* #<Thread:0x00007f12998e8d68 /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:63 sleep_forever>
   rb_thread_t:0x00007f129c0ae780 native:0x00007f1271db1700 int:0
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:149:in `pop'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:149:in `process_messages'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:121:in `process'
   /home/jluellwitz/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.11.0/lib/semantic_logger/appender/async.rb:63:in `block in reopen'

While probably not strictly required, this MR builds off my prior PR (236). PR 236 is a bit more urgent as it corrects a threading error when the logger is managed in a single thread. This PR address additional thread safety issues when the logger is managed in multiple threads. I do not believe it makes much sense to merge this PR without the other one although an argument could be made to merge PR 236 without this one.

Admittedly, managing the logger in multiple threads is fairly unlikely, but these changes need to be made to ensure that "Semantic Logger is completely thread safe" and that "all methods can be called concurrently from any thread".

I tried to maintain compatibility with method signatures and behavior as much as possible. The big exception to this is 'Async#reopen'. I cannot implement reopen in a thread safe way and still work around the Ruby 2.5 Queue bug. This shouldn't be much of an issue as you really shouldn't be calling reopen unless you have forked. The work around I had to implement is to only have reopen do something if the appender thread is not running.

I removed the change log entry that reads "Fixes a race condition in SemancicLogger.reopen." as this PR completely clobbers znz's PR (223). I should probably point out that znz's change did not really address the race condition as &. and kill are not executed atomically. @thread could still become nil between when &. and kill are executed. (I can't find any Ruby documentation that directly supports this claim, but the end of this article suggests @thread&.kill is not executed atomically: https://lucaguidi.com/2014/03/27/thread-safety-with-ruby/)

Thank you for your consideration. Please let me know if you have any questions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

reidmorrison commented 1 year ago

Since the appender list is global, can I ask about the use-case to make SemanticLogger.add_appender and SemanticLogger.clear_appenders! thread-safe? These methods are usually called once during process startup to setup the appenders.

eviljoel commented 1 year ago

Since the appender list is global, can I ask about the use-case to make SemanticLogger.add_appender and SemanticLogger.clear_appenders! thread-safe?

@reidmorrison, I don't have a real life use case. However, you do make very specific claims about thread safety in the documentation and on the project website. From docs/index.md (emphasis added):

Semantic Logger is completely thread safe and all methods can be called concurrently from any thread.

It would be perfectly acceptable to make a documentation update instead. You could say something like "Semantic Logger is completely thread safe. Unless otherwise noted, all methods can be called concurrently from any thread." and update the method documentation to explicitly state which methods are not thread safe.

I do think a change needs to be made as the documentation is currently not consistent with the functionality.

I want to point out that I do have a real life use case for PR 236. In that case, we are adding and removing appenders between every unit test.

eviljoel commented 1 year ago

I'd also like to point out that I did not change the calls that actually do the logging (info, warn, error, etc.), so performance of those calls should not be affected.

reidmorrison commented 1 year ago

To help with this process, I completely refactored the way Async and BatchAsync are handled. This should make it much easier to locate the methods that would need semaphore protection: https://github.com/reidmorrison/semantic_logger/pull/242

Also added documentation on how to test whether application code is logging correctly. See the section "Writing Tests" at the end of the guide: https://logger.rocketjob.io/api.html

We can include these changes as part of a new major release of Semantic Logger, that will also raise the minimum ruby version to 2.7.

reidmorrison commented 1 year ago

@eviljoel can you try this branch and let me know how it goes for you?

Inside Gemfile: gem "semantic_logger", github: "reidmorrison/semantic_logger", branch: "feature/refactor-async"

eviljoel commented 1 year ago

@eviljoel can you try this branch and let me know how it goes for you?

Well, it definitely is not thread-safe. I'll see if I can refactor my PR tomorrow.

eviljoel commented 1 year ago

@eviljoel can you try this branch and let me know how it goes for you?

As @keithrbennett pointed out, there is a problem in Processor#start. The following minimal program can reproduce the error:

require 'semantic_logger'
SemanticLogger.clear_appenders!
SemanticLogger.add_appender(io: $stdout)

This results in the following backtrace:

Traceback (most recent call last):
        2: from crash_test.rb:3:in `<main>'
        1: from /home/eviljoel/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.12.0/lib/semantic_logger/semantic_logger.rb:169:in `add_appender'
/home/eviljoel/.rvm/gems/ruby-2.7.6/gems/semantic_logger-4.12.0/lib/semantic_logger/processor.rb:35:in `start': undefined local variable or method `thread' for #<SemanticLogger::Processor:0x00007fec9fe713a8> (NameError)

I'd like to wait for you to fix this issue before I update my branch. How I approach solving this issue is dependent upon how you fix this error.

eviljoel commented 1 year ago

@reidmorrison, just wanted to explicitly state that I'm waiting for clarity on all points raised in this comment before I continue to rebase his PR on your refactor-async branch.

reidmorrison commented 1 year ago

I am abandoning the changes in the refactor branch since they could break existing users of Semantic Logger and that is not a risk worth taking. I would suggest using a simpler logging implementation like the built in Ruby Logger to meet your unique use case.