reidmorrison / rails_semantic_logger

Rails Semantic Logger replaces the Rails default logger with Semantic Logger
https://logger.rocketjob.io/rails
Apache License 2.0
320 stars 114 forks source link

Logging sensitive data #230

Open tomas-stefano opened 1 month ago

tomas-stefano commented 1 month ago

Context

I am working on an application where the email subject contains sensitive information, such as usernames. I have implemented a custom log formatter to redact sensitive information from logs. While the custom log formatter works correctly for web application logs, it does not seem to be applied to logs generated by background workers (Sidekiq jobs).

# config/initializer/semantic_logger.rg
  return unless defined? SemanticLogger
  require_dependency Rails.root.join('app/lib/custom_log_formatter')

  Clockwork.configure { |config| config[:logger] = SemanticLogger[Clockwork] if defined?(Clockwork) }
  SemanticLogger.add_appender(
    io: STDOUT,
    level: Rails.application.config.log_level,
    formatter: CustomLogFormatter.new,
  )

Then the CustomLogFormatter definition (I added a puts to check if is called - you can read the code):

# app/lib/custom_log_formatter
class CustomLogFormatter < SemanticLogger::Formatters::Raw
  REDACTED = '[REDACTED]'.freeze

  def call(log, logger)
    super

    add_custom_fields
    sanitize_payload_fields
    remove_post_params
    filter_skipping_email_message

    # rubocop:disable Rails/Output
    puts '-' * 80
    puts "Hash: #{hash}"
    puts '-' * 80
    # rubocop:enable Rails/Output
    hash.to_json
  end

private

  def add_custom_fields
    hash[:domain] = HostingEnvironment.hostname
    hash[:environment] = HostingEnvironment.environment_name
    hash[:hosting_environment] = HostingEnvironment.environment_name

    if (job_id = Thread.current[:job_id])
      hash[:job_id] = job_id
    end
    if (job_queue = Thread.current[:job_queue])
      hash[:job_queue] = job_queue
    end

    tid = Thread.current[:sidekiq_tid]
    if tid.present?
      ctx = Sidekiq::Context.current
      hash[:tid] = tid
      hash[:ctx] = ctx
    end
  end

  def sanitize_payload_fields
    if hash[:payload].present?
      hash[:payload].reject! { |key, _| SANITIZED_REQUEST_PARAMS.map(&:to_s).include?(key) }
      sanitize_mailer_subject_and_to
    end
  end

  def sanitize_mailer_subject_and_to
    if hash.dig(:payload, :subject).present?
      hash[:payload][:subject] = REDACTED
    end

    if hash.dig(:payload, :to).present?
      hash[:payload][:to] = REDACTED
    end
  end

  def filter_skipping_email_message
    if hash[:message]&.include?('Skipping email')
      hash[:message] = "Skipping email to #{REDACTED}"
    end
  end

  def remove_post_params
    return unless method_is_post_or_put_or_patch? && hash.dig(:payload, :params).present?

    hash[:payload][:params].clear
  end

  def method_is_post_or_put_or_patch?
    hash.dig(:payload, :method).in? %w[PUT POST PATCH]
  end
end

I can see this custom log working as expected although is does not work for the worker when they finish the job.

I can see the Custom log working for the web app and the logs working:

Hash: ... { [REDACTED] ... }

But on the worker after the mail is delivered / or skipped (even the puts is not even being called and the log leaks sensitive information):

[my-app-secondary-worker-1234] 2024-07-19 15:57:48.493654 I [1:processor] 
[ActionMailer::MailDeliveryJob] [c8a3facc-c8f0-45c1-b741-16e3e1198563] {jid: 51e96bc022d50913e3fb0ff1, queue: mailers} ActionMailer::Base -- Delivered mail 111@my-app-secondary-worker-85b9f59ff8-5l5gt.mail (97.3ms) -- 
{:event_name=>"deliver.action_mailer", 
:mailer=>"SomeMailer", 
:action=>nil, 
:message_id=>"111@myapp-secondary-worker-85b9f59ff8-5l5gt.mail", :perform_deliveries=>true, 
:subject=>"[REVIEW] John Doe did something", 
:to=>["john.doe@test.co.uk"], 
:from=>nil, :bcc=>nil, :cc=>nil, :date=>2024-07-19 14:57:48 UTC, :duration=>97.35, :args=>nil}

the subject and the to should be redacted and most important, the custom logger (and the puts) also should have being called but apparently it is not.

It the perform deliveries not using tha appenders to call the custom log?

Am I missing something?

tomas-stefano commented 1 month ago

My PR where I am working on this issue:

https://github.com/DFE-Digital/apply-for-teacher-training/pull/9502

tomas-stefano commented 1 month ago

Hi me again.

After reading the internals of the gem I realised that the gem doesn't use our custom log formatter for Action Mailer

I had to do a monkey patch on my application:

## To avoid logging sensitive data on "subjects" and "to":
module RailsSemanticLogger
  module ActionMailer
    class LogSubscriber < ::ActiveSupport::LogSubscriber
      class EventFormatter
        def payload
          {}.tap do |h|
            h[:event_name]         = event.name
            h[:mailer]             = mailer
            h[:action]             = action
            h[:message_id]         = event.payload[:message_id]
            h[:perform_deliveries] = event.payload[:perform_deliveries]
            h[:subject]            = '[FILTERED]'
            h[:to]                 = '[FILTERED]'
            h[:from]               = event.payload[:from]
            h[:bcc]                = event.payload[:bcc]
            h[:cc]                 = event.payload[:cc]
            h[:date]               = date
            h[:duration]           = event.duration.round(2) if log_duration?
            h[:args]               = '[FILTERED]'
          end
        end
      end
    end
  end
end

I do wonder if we could replace this formatter as a future feature for the gem?

tomas-stefano commented 1 month ago

We implemented on the PR linked above the event formatter using the rails parameter logging. Maybe we can add this to the gem?