Add option for omitting request data from Faraday exceptions

ClaytonPassmore commented 10 months ago

Description

Adds an option to the :raise_error middleware that prevents request data from being included in the generated Faraday exception.

Request data can include sensitive headers (e.g. Authorization) or other request parameters. Uncaught exceptions tend to make their way into bug trackers, which is not a place where you want sensitive information to go!

In order to prevent request data from being included in Faraday exceptions, you can now configure the :raise_error middleware so that it is omitted:

Faraday.new do |faraday|
  faraday.raise_error, include_request: false
end

Todos

[ ] Documentation

Additional Notes

Request data started being bundled in Faraday exceptions in #1181.

To prevent this from becoming a breaking change, I preserved the existing behaviour - request data is included by default.

That said, I would love to see request data omitted by default in a future major version. As a user of this gem, I don't expect "response" data to include "request" data.

iMacTia commented 10 months ago

Thank you for the addition @ClaytonPassmore, this makes sense to me.

I'd be happy to merge this in and cut a new release, but first I'd like to add some documentation to the middleware page 🙏

I'll need to think a bit more about the "make this a default in the next major version", as I think it's a hard balance to strike. It's true that the request headers/body might contain sensitive data, but this is not logged automatically and if you send data to a 3rd party service (e.g. error tracker) then I'd expect you to trust that as well, or at least the data to have a controlled retention period. Making this default to false in a future major release could do more harm than good, I'll need to ponder it a bit more.

Just an observation: rather than removing the whole request from the exception, do you think we could do something smarter and make it so that you can filter just the data you need to? For example, the logger middleware allows you to define filters to strip things like passwords from logs. That's an easier case since we're dealing with strings, but maybe we can apply the same concept here in a similar way as well?

ClaytonPassmore commented 10 months ago

Thanks for the reply @iMacTia! I added some documentation to the middleware page as requested 👍

Making this default to false in a future major release could do more harm than good, I'll need to ponder it a bit more.

That's totally fair. Having the ability to configure the behaviour is enough to satisfy my needs 👌

Just an observation: rather than removing the whole request from the exception, do you think we could do something smarter and make it so that you can filter just the data you need to? For example, the logger middleware allows you to define filters to strip things like passwords from logs. That's an easier case since we're dealing with strings, but maybe we can apply the same concept here in a similar way as well?

I had a very similar thought when I first approached this problem. One of the challenges I encountered is that there are so many different parts of a request that are formatted differently - headers, body params, and query params. I wasn't sure if it made sense to have separate filter options for each type of field - e.g.

Faraday.new do |f|
  f.raise_error,
    header_filters: ["Authorization", "X-Private-Key"],
    query_param_filters: ["api_key", /.*_?email/],
    body_filters: [
      /"api_key":("[^"]+")/,  # E.g. JSON
      /<email>([^<]+)</email>/  # E.g. XML
    ]
end

The filtering for body parameters felt extremely cumbersome to specify - I think it's a little too easy to specify a pattern that ends up breaking the content type or doesn't entirely replace the sensitive content.

As a result, I went with the "big hammer" approach rather than the "finesse" approach 😄 Definitely open to your thoughts on this though!

iMacTia commented 10 months ago

Fair enough, I think if we want to add a more "granular" approach later on it won't clash with the one proposed in this PR, so happy to get this in for the time being 👍

ericboehs commented 2 months ago

I'm interested in seeing this as the default.

We have many places in our API where we don't want this behavior. So far we've disabled it in 38 places and are looking to add a linter to ensure future references to Faraday::Connection include the include_request: false option.

iMacTia commented 2 months ago

@ericboehs This is actually very interesting, I'm surprised you added it to so many places to be honest.

The recommended usage of Faraday is to create a connection object per API you integrate to, so unless you integrate with 38 different API you shouldn't need to disable it in so many places 🤔

I'd love to learn more because my assumption so far was that you'd eventually need to do this in a few places, so the overall burden was very low and understandable to guarantee backwards compatibility.

But this could be revisited if the friction is to high: if not changing the default, maybe with a global config or similar

ericboehs commented 2 months ago

At the VA, we do integrate with a lot of APIs. There are discussions of breaking our modules out into their own applications. I'm not certain we're doing things correctly but here's a draft PR where we're working toward implementing this option.

iMacTia commented 2 months ago

Right, so there's clearly some element of repetition here where multiple connections actually point to the same API (which is hard to tell just looking at the code, since everything is dynamically loaded through Settings).

That said, you still do integrate with a whole lot of APIs in that project 😅! And I can also see how doing something like filtering out the Authorization header would be a sensible default, but you also have lots of custom headers in your implementation so that would also not help at all 😞.

I think a possible middle ground solution to preserve backwards compatibility but support extreme use cases like yours would be to have the possibility to specify a default value for options at the middleware level, something like:

Faraday::Response::RaiseError.default_options = { include_request: false }

conn = Faraday.new(...) do |f|
  ...
  f.use :raise_error # no need to specify include_request: false here
end

Considerations:

This would mean using class variables, and open up to possible thread-safety issues so it would require further consideration
How would the middleware behave if the connection builder also passed options? Would that replace or merge with the default ones?
This solution could possibly apply to all middleware if we implement it into the base Faraday::Middleware class

cc @olleolleolle interested in your feedback on this little design proposal 😄

olleolleolle commented 2 months ago

Re: 2, merge, I think.

This has the same default procedure as Faraday itself, does it not? Or are there new thread-safety risks?

lostisland / faraday