roidrage / lograge

An attempt to tame Rails' default policy to log everything.
http://www.paperplanes.de/2012/3/14/on-notifications-logsubscribers-and-bringing-sanity-to-rails-logging.html
MIT License
3.45k stars 299 forks source link

Request logs being lost in production #195

Closed mhenrixon closed 7 years ago

mhenrixon commented 7 years ago

Hi,

I posted this on Stackoverflow some time ago but it isn't getting any attention. http://stackoverflow.com/questions/41431108/rails-request-logs-are-missing-when-using-lograge

# config/environments/production.rb

config.log_level = :info
config.lograge.enabled = true
config.lograge.custom_options = lambda do |event|
  request_params = event.payload[:params].except("controller", "action", "")
  { params: request_params, ip: event.payload[:ip], uid: event.payload[:user_id] }
end

This is working for background workers. For the rails requests it stops working after some time.

2017-01-02T17:05:56.318489+00:00 heroku[router]: at=info method=POST path="/comments/3297684/cheer"
2017-01-02T17:05:55.915014+00:00 heroku[router]: at=info method=GET path="/north-engine-crossfit/feed"
2017-01-02T17:05:56.576926+00:00 heroku[router]: at=info method=GET path="/notifications?"
2017-01-02T17:05:56.455991+00:00 heroku[router]: at=info method=GET path="/assets/fallback/thumb_default.png"
2017-01-02T17:05:56.620939+00:00 heroku[router]: at=info method=GET path="/events/2808332/comments"
2017-01-02T17:05:56.930687+00:00 heroku[router]: at=info method=GET path="/workouts/bar-work-and-some-rowing/results/new?event_id=2808133"
2017-01-02T17:05:57.136227+00:00 heroku[router]: at=info method=GET path="/tfw-helsinki/feed"
2017-01-02T17:05:57.862004+00:00 heroku[router]: at=info method=GET path="/gyms/crossfit-herttoniemi/widgets/info"
2017-01-02T17:05:58.012593+00:00 heroku[router]: at=info method=GET path="/workouts/results/new"
2017-01-02T17:05:57.875231+00:00 heroku[router]: at=info method=GET path="/api/v1/mobile/users/get_notifications_count.json?"
2017-01-02T17:05:58.385305+00:00 heroku[router]: at=info method=POST path="/comments/3297738/cheer"
2017-01-02T17:05:58.514149+00:00 heroku[router]: at=info method=GET path="/events/2808045/comments" 
2017-01-02T17:05:58.562364+00:00 heroku[router]: at=info method=GET path="/events/2809768/comments" 

As you can see above only the heroku[router] is logging to STDOUT. Lograge doesn't log anything except every now and then when the following appears:

2017-01-02T17:05:56.302136+00:00 app[web.1]: ** [NewRelic][2017-01-02 17:05:56 +0000 web.1 (8)] WARN : Skipping RUM instrumentation. Unable to find <body> tag in first 50000 bytes of document.

If I have a look at what is logged from the background workers I get a lot of nice output like:

017-01-02T17:05:59.018898+00:00 app[worker.2]: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] Job DeliverDeviceNotification (id=10753844) RUNNING
2017-01-02T17:05:59.018913+00:00 app[worker.2]: 2017-01-02T17:05:59+0000: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] Job DeliverDeviceNotification (id=10753844) RUNNING
2017-01-02T17:05:59.035080+00:00 app[worker.2]: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] Job DeliverDeviceNotification (id=10753844) COMPLETED after 0.0161
2017-01-02T17:05:59.035145+00:00 app[worker.2]: 2017-01-02T17:05:59+0000: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] Job DeliverDeviceNotification (id=10753844) COMPLETED after 0.0161
2017-01-02T17:05:59.038990+00:00 app[worker.2]: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] Job DeliverDeviceNotification (id=10753845) RUNNING
2017-01-02T17:05:59.039046+00:00 app[worker.2]: 2017-01-02T17:05:59+0000: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] Job DeliverDeviceNotification (id=10753845) RUNNING
2017-01-02T17:05:59.054456+00:00 app[worker.2]: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] Job DeliverDeviceNotification (id=10753845) COMPLETED after 0.0153
2017-01-02T17:05:59.054508+00:00 app[worker.2]: 2017-01-02T17:05:59+0000: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] Job DeliverDeviceNotification (id=10753845) COMPLETED after 0.0153
2017-01-02T17:05:59.057328+00:00 app[worker.2]: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] 2 jobs processed at 46.3347 j/s, 0 failed
2017-01-02T17:05:59.057383+00:00 app[worker.2]: 2017-01-02T17:05:59+0000: [Worker(host:74f88e14-63db-431d-ab87-8c8a0e031866 pid:4)] 2 jobs processed at 46.3347 j/s, 0 failed

Anyone have any ideas why the request logging isn't working?

benlovell commented 7 years ago

Could you provide one or more of the last logged request(s) before lograge stops working? This one is going to be tricky to diagnose for sure.

mhenrixon commented 7 years ago

@benlovell I can't find any request logs actually. This is what the logs look like that end up in amazon:

cloudwatch management console 2017-01-09 11-19-42

Any suggestions on how this might be debugged?

benlovell commented 7 years ago

The newrelic entry is a red herring here. It seems lograge has never processed logs for you. Can I see where and how you configure the Rails logger?

mhenrixon commented 7 years ago

@benlovell it is in the description of the issue. First code part:

lograge 2017-01-12 13-52-01

config/environments/production.rb

benlovell commented 7 years ago

That's your lograge configuration. I need to see how you're configuring Rails' logger.

mhenrixon commented 7 years ago

There is no configuration done specifically for rails logger.

# config/environments/production.rb
  # Only show one line per request.
  # config.logger.level = Logger::WARN
  # config.log_formatter = ::Logger::Formatter.new
  # config.log_tags = [ :uuid ]

  # Use a different logger for distributed setups
  # config.logger = ActiveSupport::TaggedLogging.new(SyslogLogger.new)

We have the same configuration for our staging environment and there I see the following when I visit a route.

2017-01-16T11:14:40.650189+00:00 app[web.1]: method=GET path=/ format=html controller=PagesController action=root status=200 duration=170.03 view=161.44 db=7.22 params={} ip=94.234.32.132 uid=

The app is running on Heroku and using whatever standard logger they provide.

mhenrixon commented 7 years ago

Is it the backtrace problem that the backtrace ends up on multiple log lines? How should I handle backtraces?

benlovell commented 7 years ago

Not really. It seems lograge isn't involved at all judging by your log output in the production environment. Can you isolate the differences between staging (where you know this works) and production? Everything seems OK to me given what you've shown me.

I'd be happy to take a look at your production environment if you can get a repro for me. I understand this may not be possible though.

mhenrixon commented 7 years ago

@benlovell if I go to console and do the following Rails.logger I get this:

irb(main):001:0> Rails.logger
=> #<RailsStdoutLogging::StdoutLogger:0x007fe0a27693c0 @progname=nil, @level=1, @default_formatter=#<Logger::Formatter:0x007fe0a27692d0 @datetime_format=nil>, @formatter=#<ActiveSupport::Logger::SimpleFormatter:0x007fe0a27691b8 @datetime_format=nil>, @logdev=#<Logger::LogDevice:0x007fe0a2769258 @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#<IO:<STDOUT>>, @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x007fe0a27691e0>>, @local_levels=#<ThreadSafe::Cache:0x007fe0a2768fd8 @backend={}, @default_proc=nil>>

Does that help at all?

mvaragnat commented 7 years ago

Just testing lograge - the gem looks very promising ! I have the same issue as described here. Using Rails 5

here is lograge.rb Rails.application.config.lograge.enabled = true

Please let me know if there is any other piece of info you'd need to investigate

mhenrixon commented 7 years ago

@mvaragnat are you using newrelic as well?

benlovell commented 7 years ago

The presence of newrelic isn't an issue. I have many apps running in varying environments on heroku all successfully logging via lograge so we can rule out this particular environment.

Have you tried spinning up a completely new app, configuring this with lograge and then pushing to heroku? If you could both reduce this down to non-working, smaller applications that you could grant me access to I could definitely help. Or of course if you could grant me access to your other environments I'd be happy to help. Assisting here is like a needle in a haystack I'm afraid.

mhenrixon commented 7 years ago

@benlovell it is a weird issue indeed. I have it running fine on A Rails 5 application. The problem I am experiencing might have something to do with high load or concurrency as it has very high traffic compared to other apps I am working on.

benlovell commented 7 years ago

@mhenrixon High throughput shouldn't be an issue. I know of several high traffic sites (for example: https://github.com/alphagov/e-petitions running up to 650k RPM) processing logs with lograge.

If I had to guess I'd say something clobbers our log subscriber during boot but that's just a hunch. If possible, could you run the following in a production console and show me the output?

ActiveSupport::LogSubscriber.log_subscribers.map(&:class)

Thanks.

mhenrixon commented 7 years ago

You are right we do not have that type of traffic yet. It is an old codebase though and it isn't the most performant code. We also have trouble with exceeding memory and such things.

ActiveSupport::LogSubscriber.log_subscribers.map(&:class) 
=>  [
  ActionView::LogSubscriber, 
  ActiveRecord::LogSubscriber, 
  ActionController::LogSubscriber, 
  ActionMailer::LogSubscriber, 
  Lograge::RequestLogSubscriber
]

Could it be something like the need of $stdout.sync? The problem occurs like I said after the application is running for some time.

benlovell commented 7 years ago

$stdout.sync = true would be worth a try given what you just mentioned. Do you have the Rails 12 factor gem loaded?

mhenrixon commented 7 years ago

It is indeed loaded, rails 4.2 app on Heroku you don't really have a choice right?

mhenrixon commented 7 years ago

$stdout.sync doesn't help! It is some type of threading issue. The most likely is something in the lines of https://github.com/rails/rails/pull/16885 or https://github.com/rails/rails/issues/14031 or https://github.com/rails/activerecord-session_store/pull/22 now to find the culprit...

Lograge has nothing to do with this as we experience the same thing regardless of lograge. It seems a similar issue has been raised in the past: https://github.com/roidrage/lograge/issues/65

mustmodify commented 5 years ago

@mhenrixon any update on this? I'm experiencing a similar, mysterious issue.

mhenrixon commented 5 years ago

Unfortunately not @mustmodify, I ended up going with Logging instead https://github.com/TwP/logging

mustmodify commented 5 years ago

Turns out I had set production to log_level :warn, which is less helpful than you'd think.

ghiculescu commented 3 years ago

For the record I had a really similar issue, and this config worked for me:

    # application.rb

    config.lograge.enabled = true
    config.lograge.logger = ActiveSupport::TaggedLogging.new(ActiveSupport::Logger.new($stdout))

    # then in development.rb we set log_level = :debug, and in production.rb we set log_level = :info

Without the config.lograge.logger = line I got no heroku output.

Pyo25 commented 3 years ago

I faced the same kind of issue. I configured the logger like @ghiculescu but in environments/production.rb:

    logger = ActiveSupport::Logger.new(STDOUT)
    logger.formatter = config.log_formatter
    config.logger    = ActiveSupport::TaggedLogging.new(logger)