treasure-data / serverengine

A framework to implement robust multiprocess servers like Unicorn
Apache License 2.0
759 stars 86 forks source link

Detect log rotation and release renamed files #106

Closed kenhys closed 2 years ago

kenhys commented 3 years ago

In the previous vesions, there was no way to detect log rotation event when log file was rotated.

It causes that rotated log file was grabbed by DaemonLogger. This was unexpected behavior.

This commit fixes by reopening log file correctly.

NOTE: This PR depends on inotify so only linux was supported.

kenhys commented 3 years ago

A bit more work is needed to fix https://github.com/fluent/fluentd/issues/2426#issuecomment-717646842 completely.

repeatedly commented 3 years ago

serverengine has own log rotation mechanizm and it doesn't have this problem, right? If this problem happens with only external log rotation tool, how about rb-inofity is optional?

kenhys commented 3 years ago

I've changed dependency to optional.

repeatedly commented 3 years ago

Looks good to me with optional approach :)

kenhys commented 3 years ago

I've rebased PR to follow recent master.

ashie commented 3 years ago

I've added CI for with and w/o rb-notify. All checks are passed.

ashie commented 3 years ago

I've added CI for with and w/o rb-notify. All checks are passed.

No, still rb-inotify isn't recognized:

Pending:
  ServerEngine::DaemonLogger reopen log when path is renamed
    # not supported except Linux with inotify
    # ./spec/daemon_logger_spec.rb:182
  ServerEngine::DaemonLogger reopen log when path is renamed by external process
    # not supported except Linux with inotify
    # ./spec/daemon_logger_spec.rb:192
  ServerEngine::DaemonLogger reopen logger when file is renamed
    # not supported except Linux with inotify
    # ./spec/daemon_logger_spec.rb:205
  ServerEngine::MultiThreadServer raises SystemExit when all workers exit with specified code by unrecoverable_exit_codes
    # unrecoverable_exit_codes supported only for multi process workers
    # ./spec/multi_process_server_spec.rb:63
ashie commented 3 years ago

Hmm, when I test it on my local environment, sometimes a test stalls.

It seems that it reproduced also on GitHub Actions: https://github.com/treasure-data/serverengine/pull/106/checks?check_run_id=2652587049

@kenhys Could you take look this?

kenhys commented 3 years ago

~In my local environment, IO logger test case seems to stall. (Not identified the reason yet)~

% bundle exec rspec -f documentation 2>&1 | tee ../take12.log

ServerEngine::BlockingFlag
  set and reset
  set! and reset! return whether it toggled the state
  wait_for_set timeout
  wait_for_reset timeout
  wait

ServerEngine::DaemonLogger
  reopen
  reset path
  default level is debug
  level set by int
  level set by string
  unknown level
  rotation
  IO logger
kenhys commented 3 years ago

~It may be caused by log.reopen!.~

it 'IO logger' do
    io = StringIO.new
    io.should_receive(:write)
    io.should_not_receive(:reopen)

    log = DaemonLogger.new(io)
    log.debug "stdout logging test"
    log.reopen! # <= STALLED HERE
  end
kenhys commented 3 years ago

I've misunderstand rspec workflow.

executing do block, call after (if exists), then show it ... label. So, 'inter-process locking on rotation' test case is stalling. (next test case for 'IO logger')

kenhys commented 3 years ago

It seems that sometimes Process.waitpid pid2 blocks.

https://github.com/treasure-data/serverengine/blob/master/spec/daemon_logger_spec.rb#L168

kenhys commented 3 years ago

As far as I know, only 'inter-process locking on rotation' test case stalls. (frequency 2-3 stalls/10times)

kenhys commented 3 years ago

As far as I know, only 'inter-process locking on rotation' test case stalls. (frequency 2-3 stalls/10times)

It tend to stall when process(pid2) writes the limit (~30 bytes = log_rotate_age * log_rotate_size), proces(pid2) stucks.

kenhys commented 3 years ago

To clarify the issue:

bundle install --with inotify
bundle exec rspec -f documentation -e inter
kenhys commented 3 years ago

It tend to:

pass:

stall:

kenhys commented 3 years ago

It tend to:

pass:

* `@inotify.watch` event is fired for process A and Process B (each process close @logdev and reopen it)

stall:

* `@inotify.watch` event is fired for process A only

some extent true, but not applied to all cases...

kenhys commented 3 years ago

I'll check the current implementation, again.

ashie commented 3 years ago

I've added a work around, but I'm not sure it's correct fix. Further verification is needed.

ashie commented 2 years ago

I open another pull request to resolve this issue: #127