honeycombio / beeline-ruby

Legacy instrumentation for your Ruby apps with Honeycomb.
Apache License 2.0
22 stars 32 forks source link

Rack Integration Raises an Error During File Upload #62

Closed mstruve closed 4 years ago

mstruve commented 4 years ago

I recently noticed that we have had a few errors raised in our production environment from the beeline gem when a user tries to upload a file. They seem pretty deep in the rack implementation so I figured I would share here to see if you all had any suggestions for a fix.

Here is the error and backtrace:

NoMethodError: undefined method `empty?' for #<ActionDispatch::Http::UploadedFile:0x00000000085b38a0>
 rack.rb  31 block (2 levels) in call(...)
[GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rack.rb:31:in `block (2 levels) in call'
29       client.start_span(name: "http_request", serialized_trace: hny) do |span|
30         add_field = lambda do |key, value|
31           next unless value && !value.empty?
32 
33           span.add_field(key, value)

 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rails.rb:14 :in `tap`
 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rails.rb:14 :in `add_package_information`
 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rack.rb:40 :in `block in call`
 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/client.rb:54 :in `start_span`
 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rack.rb:29 :in `call`
 [GEM_ROOT]/gems/actionpack-5.2.4.1/lib/action_dispatch/middleware/remote_ip.rb:81 :in `call`
 [GEM_ROOT]/gems/request_store-1.4.1/lib/request_store/middleware.rb:19 :in `call`
 [GEM_ROOT]/gems/rack-timeout-0.6.0/lib/rack/timeout/core.rb:151 :in `block in call`
 [GEM_ROOT]/gems/rack-timeout-0.6.0/lib/rack/timeout/support/timeout.rb:19 :in `timeout`
 [GEM_ROOT]/gems/rack-timeout-0.6.0/lib/rack/timeout/core.rb:150 :in `call`
 [GEM_ROOT]/gems/actionpack-5.2.4.1/lib/action_dispatch/middleware/request_id.rb:27 :in `call`
 [GEM_ROOT]/gems/rack-2.0.8/lib/rack/method_override.rb:22 :in `call`
 [GEM_ROOT]/gems/rack-2.0.8/lib/rack/runtime.rb:22 :in `call`
 [GEM_ROOT]/gems/activesupport-5.2.4.1/lib/active_support/cache/strategy/local_cache_middleware.rb:29 :in `call`
 [GEM_ROOT]/gems/actionpack-5.2.4.1/lib/action_dispatch/middleware/executor.rb:14 :in `call`
 [GEM_ROOT]/gems/actionpack-5.2.4.1/lib/action_dispatch/middleware/static.rb:127 :in `call`
 [GEM_ROOT]/gems/rack-2.0.8/lib/rack/sendfile.rb:111 :in `call`
 [GEM_ROOT]/gems/ddtrace-0.30.1/lib/ddtrace/contrib/rack/middlewares.rb:85 :in `call`
 [GEM_ROOT]/gems/railties-5.2.4.1/lib/rails/engine.rb:524 :in `call`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/configuration.rb:228 :in `call`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/server.rb:681 :in `handle_request`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/server.rb:472 :in `process_client`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/server.rb:328 :in `block in run`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/thread_pool.rb:134 :in `block in spawn_thread`

Our Honeycomb setup is very basic and we just use the default suggested tracking for Rails at this point.

Thanks in advance for the help and if there is anything else you need from me let me know!

DanielHeath commented 4 years ago

You should use blank? instead of empty? - blank? is defined on everything in rails.

ajvondrak commented 4 years ago

While the surface cause of the error is Honeycomb invoking #empty? on an ActionDispatch::Http::UploadedFile instance, I was confused about how this error could be triggered in the first place.

Honeycomb::Rails#add_package_information only adds (up to) 3 fields request.controller, request.action, and request.route. And of those, only 2 are taken from the request parameters where you might find the UploadedFile. Specifically, Honeycomb uses the :action and :controller routing parameters: https://github.com/honeycombio/beeline-ruby/blob/0285e1881711e416e579861240eb93595821cbaf/lib/honeycomb/integrations/rails.rb#L20-L21

What I'm guessing is that you have a form field somewhere named action or controller, which clobbers the Rails routing parameters, causing the file upload to blow up the Honeycomb instrumentation.

Does this sound plausible? Because otherwise I can't figure out how your parameter is getting into the beeline's span in the first place. A cursory search of the dev.to source shows this partial, which looks like it might be generating the parameter named based on dynamic data (?):

  <%= f.file_field field.attribute %>

But that's just me taking a shot in the dark. The term "action" seems to crop up in your source code a lot, too; I don't know if there are any associated file uploads with that parameter name.


As for a solution, the beeline could technically handle unexpected types better. Something simple like

diff --git a/lib/honeycomb/integrations/rack.rb b/lib/honeycomb/integrations/rack.rb
index f3808d4..19b5188 100644
--- a/lib/honeycomb/integrations/rack.rb
+++ b/lib/honeycomb/integrations/rack.rb
@@ -28,9 +28,7 @@ module Honeycomb
       hny = env["HTTP_X_HONEYCOMB_TRACE"]
       client.start_span(name: "http_request", serialized_trace: hny) do |span|
         add_field = lambda do |key, value|
-          next unless value && !value.empty?
-
-          span.add_field(key, value)
+          span.add_field(key, value) if valuable?(value)
         end

         extract_fields(env, RACK_FIELDS, &add_field)
@@ -58,6 +56,14 @@ module Honeycomb
       end
     end

+    def valuable?(value)
+      if value.respond_to?(:empty?)
+        !value.empty?
+      else
+        !value.nil?
+      end
+    end
+
     # Rack middleware
     class Middleware
       include Rack

But if this is the error you're getting, I'm afraid you'd just get nonsensical strings like #<ActionDispatch::Http::UploadedFile:0x00000000085b38a0> in your request.action field, which doesn't truly solve the problem. Stops hard errors from happening, though, so still potentially worthwhile.

Alternatively, I wonder if there's a way to get at Rails routing information without having to look through the fraught request parameters. 🤔 If so, the beeline should almost certainly prefer to use that method (if my theory is right that action/controller can be clobbered).

mstruve commented 4 years ago

Good catch! That information about how you parse the controller actions is SUPER HELPFUL!

I went back and looked at the Honeybadger involving the error and sure enough, the action param is getting overridden. I am going to dig further into this and figure out if we can prevent that overwritten action from happening.

{
  "action" => "#<ActionDispatch::Http::UploadedFile>",
  "submitter" => "#<ActionDispatch::Http::UploadedFile>",
  "overwriteexistingtheme" => "#<ActionDispatch::Http::UploadedFile>",
  "page" => "xyz"
}

I do think preventing the hard errors from rising up and better handling the unexpected types would probably be a good idea. Then people aren't being thrown 500s but rather seeing the issue in unexpected data rather than unexpected errors. At least that is how I would prefer it. It might also be useful to log a warning or something if you run into an action that is unexpected to help people debug.

ajvondrak commented 4 years ago

The plot thickens... 🤔

I think a way we could fix this would be to use ActionDispatch::Request#path_parameters instead of #params, since that method seems to contain the subset of information we're looking for in the beeline (without being touched by the get/post parameters).

However, trying to write a failing unit test to the effect, I can't actually reproduce the issue! 😱 Just trying something simple like this passes no problem:

diff --git a/spec/honeycomb/integrations/rails_spec.rb b/spec/honeycomb/integrations/rails_spec.rb
index 9a0ffc0..52a3913 100644
--- a/spec/honeycomb/integrations/rails_spec.rb
+++ b/spec/honeycomb/integrations/rails_spec.rb
@@ -37,6 +37,7 @@ if defined?(Honeycomb::Rails)

         app.routes.draw do
           get "/hello/:name" => "test#hello"
+          post "/goodbye" => "test#goodbye"
         end
       end
     end
@@ -45,6 +46,10 @@ if defined?(Honeycomb::Rails)
       def hello
         render plain: "Hello World!"
       end
+
+      def goodbye
+        render plain: "Goodbye World!"
+      end
     end

     describe "a standard request" do
@@ -68,6 +73,15 @@ if defined?(Honeycomb::Rails)
       it_behaves_like "event data"
     end

+    describe "a request that clobbers routing parameters" do
+      before { post "/goodbye", action: "wave" }
+
+      it "still sets the request.action field correctly" do
+        event = libhoney_client.events.last
+        expect(event.data["request.action"]).to eq "goodbye" # not "wave"
+      end
+    end
+
     if VERSION >= Gem::Version.new("5")
       describe "a bad request" do
         before do

It seems that ActionDispatch::Request#params should always be overwriting the :action + :controller values with the routing params, specifically due to

        params = begin
                   request_parameters.merge(query_parameters)
                 rescue EOFError
                   query_parameters.dup
                 end
        params.merge!(path_parameters)

The dev.to code is complicated enough that I'm not really sure how to track down who's overriding the action parameter - and how such a thing is possible! Maybe some middleware from some gem is mangling the action_dispatch.request.parameters variable in the Rack env? Because then Rails would just return whatever value is cached in there.

I'm not sure if this would happen any way other than a rogue middleware, because the beeline is even constructing a brand-spanking-new ActionDispatch::Request to work with, so any mangling that your controller does (e.g., with a before_action) would be performed on a separate Request instance. Though I guess maybe they'd still share the params via the hash stashed in the Rack env...blargh. 😵

I look forward to you tracking down how this is happening, since it doesn't seem like Rails should be letting request.params[:action] be clobbered in the first place. 😅 If we can get a test that reproduces this reliably, then we would be able to see if various fixes work (e.g., using #path_parameters). Or if this is due to some gem mangling the routing parameters, maybe a better idea would be to fix the behavior with that gem (while of course preventing hard errors inside the beeline).

mstruve commented 4 years ago

After some investigation, I am pretty sure this request is not originating from our Rails app. From looking at the Honeybadger, there is no referrer for the request, the url the user is making a POST to is http://dev.to/wp-admin/admin-post.php?page=wysija_campaigns and the browser is python-requests/2.22.0. All of that leads me to believe this is a pentester which I totally should have figured out sooner but I have been trying to triage these Honeybadgers quickly and completely missed those hints.

Given that conclusion, I don't think there is really anything here to fix. My solution at this point is going to be to mark the Honeybadger as ignored. We can definitely close this issue since I really don't want to waste any more of your time. Once again, I am so sorry I didn't see those details sooner!

ajvondrak commented 4 years ago

Eureka! That was the clue we needed! 🎉

Rails can only set the action/controller parameters if the route was recognized in the first place, since #path_parameters is set by the Rails router. So ultimately, unrecognized requests result in #params being merged with an empty hash, meaning the get/post parameters from the HTTP request remain untouched.

We can definitely close this issue since I really don't want to waste any more of your time. Once again, I am so sorry I didn't see those details sooner!

Absolutely not a waste of time. I'm so happy to finally figure out the problem, and it took your sleuthing to narrow everything down. Thanks so much for satisfying my sense of order, lol. Writing a PR to fix this issue now. ❤️

P.S.: I think this shift away from get/post parameter parsing will be a vast improvement over the workarounds of #39 & #50, too. 😍