whitequark / rack-utf8_sanitizer

Rack::UTF8Sanitizer is a Rack middleware which cleans up invalid UTF8 characters in request URI and headers.
MIT License
314 stars 53 forks source link

Sanitize requests with no content type #50

Open jakeonfire opened 5 years ago

jakeonfire commented 5 years ago

We had lots of web requests circumventing our installation of rack-utf8_sanitizer, and it turned out they were for additional resources (css, js, etc) and had no content type. Luckily, the following configuration worked for us:

config.middleware.insert 0, Rack::UTF8Sanitizer, additional_content_types: [nil]

Perhaps this is worth mentioning in the README. Has anyone else run into this issue?

ndbroadbent commented 4 years ago

@jakeonfire Thanks for the tip! I've just installed rack-utf8_sanitizer, and I'm checking out the issues to see if there are any problems to be aware of.

I was wondering what kind of UTF-8 errors you were seeing for your assets? Is this for the paths or query params? (I was not able to reproduce this for a file in my ./public folder: /logo.png?q=%ff works fine.)

I like to add test cases to ensure that things keep working, so it would be great to get some more examples. Here's my current spec/request/rack_utf8_sanitizer_spec.rb:

# frozen_string_literal: true

RSpec.describe 'rack-utf8_sanitizer handles invalid UTF-8 characters', type: :request do
  it 'does not crash with invalid UTF-8 characters in a path' do
    # If rack-utf8_sanitizer isn't working, this error will be:
    # ArgumentError: invalid byte sequence in UTF-8
    # See: https://github.com/whitequark/rack-utf8_sanitizer/
    expect { get '/%ff' }.to raise_error ActionController::RoutingError

    expect {
      get '/%ff', headers: { 'Content-Type' => nil }
    }.to raise_error ActionController::RoutingError
  end
end

additional_content_types: [nil] sounds like a good tip, but I'd like to reproduce the issue first in a test case before I add it.

jakeonfire commented 4 years ago

@ndbroadbent yeah, we were getting "ArgumentError: invalid byte sequence in UTF-8" errors from query string keys, which is likely quite borderline. but try

get '/?%ff=1', headers: { 'Content-Type' => nil }