lostisland / faraday

Simple, but flexible HTTP client library, with support for multiple backends.
https://lostisland.github.io/faraday
MIT License
5.7k stars 971 forks source link

Some API providers cannot interpret encoded colons in the path #1567

Closed ykrods closed 2 weeks ago

ykrods commented 3 weeks ago

Hi and thank you very much for this project at first.

Basic Info

Issue description

By the #1237 improvements, colons in path are now encoded as %3A. However, it is up to the server implementation to accept encoded colons, and in fact Firebase's fcm v1 api responds 404 status for encoded messages%3Asend .

https://firebase.google.com/docs/reference/fcm/rest/v1/projects.messages/send

Currently, encoding occurs only in limited cases like a below reproduction code.

And no error occurs in the following situations.

  1. Give the full url to Faraday.initialize() or post() method
  2. Give the full path to post() (like: client.post("/v1/projects/#{project_id}/messages:send") )

Steps to reproduce

(using ENV for secret values)

require "faraday"

project_id = ENV["PROJECT_ID"]
iid_token = ENV["IID_TOKEN"]

def get_access_token
  ENV["ACCESS_TOKEN"]
end

base_url = "https://fcm.googleapis.com/v1/projects/#{project_id}"
path = "messages:send"

client = Faraday.new(url: base_url) do |builder|
  builder.request :authorization, 'Bearer', get_access_token
  builder.request :json
  builder.response :json,
                   content_type: /application\/json/,
                   parser_options: { symbolize_names: true }
  builder.response :logger,
                   Logger.new(STDOUT),
                   { headers: true, bodies: true, errors: true }
end

res = client.post(path) do |req|
  req.body = {
    validate_only: true,
    message: {
      token: iid_token,
      notification: {
        title: "test",
        body: "test",
      }
    }
  }
end

Execution log

I, [2024-06-13T08:31:07.928238 #1]  INFO -- request: POST https://fcm.googleapis.com/v1/projects/***/messages%3Asend
I, [2024-06-13T08:31:07.928371 #1]  INFO -- request: User-Agent: "Faraday v2.9.1"
Authorization: "Bearer ***"
Content-Type: "application/json"
I, [2024-06-13T08:31:07.928417 #1]  INFO -- request: {"validate_only":true,"message":{"token":"***","notification":{"title":"test","body":"test"}}}
I, [2024-06-13T08:31:08.084930 #1]  INFO -- response: Status 404
I, [2024-06-13T08:31:08.085055 #1]  INFO -- response: content-type: "text/html"
date: "Thu, 13 Jun 2024 08:31:08 GMT"
server: "scaffolding on HTTPServer2"
content-length: "0"
x-xss-protection: "0"
x-frame-options: "SAMEORIGIN"
x-content-type-options: "nosniff"
alt-svc: "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000"
I, [2024-06-13T08:31:08.085115 #1]  INFO -- response:

My thoughts

https://github.com/lostisland/faraday/blob/4abafa5c66ce75bc2abbf0a5e7e98137dea91eb2/lib/faraday/connection.rb#L470-L483

How about replace URL + string merge to string + string merge

- url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
- uri = url ? base + url : base
+ uri = url ? URI.parse(base.to_s + url) : base

This has the following advantages and disadvantages.

Advantages.

  1. No error if colons are included in the path url.
  2. User can decide whether to encode colons or not.
  3. The path contained in base url is not ignored if the path url starts with slash.

Disadvantages.

  1. Inefficient because base is already parsed to URL.
  2. Error might occur if query or hash is contained in the base url.
  3. Needs to remove first slash because base url is assumed to end with slash.
iMacTia commented 3 weeks ago

Hi @ykrods and thank you for the thorough investigation! The escaping actually came out during the review of #1237, although at the time we didn’t find any case where this could cause a problem. Now that you’ve brought one, I really think we should get this fixed.

The underlying issue was described in this comment, so I want to make sure we don’t fall back into that. To check your proposed solution, I’ve run the following:

base = URI.parse('https://service.com/')
url = 'service:search?limit=50&offset=400'
uri = url ? URI.parse(base.to_s + url) : base # this is your proposed solution, which works!
=> #<URI::HTTPS https://service.com/service:search?limit=50&offset=400>

Thank you also for pointing out pros/cons of the proposed solution. Looking at the disadvantages:

  1. I really don’t think this is gonna be a problem. Running the following benchmark actually shows the proposed solutions to be much faster, actually!
    Benchmark.bm do |x|
    x.report do
    10_000.times do
      url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
      uri = url ? base + url : base
    end
    end
    x.report { 10_000.times { URI.parse(base.to_s + url) } }
    end
       user     system      total        real
    0.184706   0.021542   0.206248 (  0.206335)
    0.040563   0.001414   0.041977 (  0.041988)
  2. If the base already contains query params or hash, then there should be no URL?
  3. Could you articulate more on this one? I’m not sure I understand which case you’re pointing out here
ykrods commented 3 weeks ago

@iMacTia

  1. I really don’t think this is gonna be a problem. Running the following benchmark actually shows the proposed solutions to be much faster, actually!

Thanks for benchmarking and very good news!

  1. If the base already contains query params or hash, then there should be no URL?

I know this might not be a common use case, but I concerned about like a below case.

base = URI.parse("https://example.com/search?limit=100")
url = "additional_path"
p URI.parse(base.to_s + url) # ==> <URI::HTTPS https://example.com/search?limit=100additional_path>

However, I found query is removed in url_prefix= method, so it's disadvantage has been resolved.

https://github.com/lostisland/faraday/blob/4abafa5c66ce75bc2abbf0a5e7e98137dea91eb2/lib/faraday/connection.rb#L356-L362

  1. Could you articulate more on this one? I’m not sure I understand which case you’re pointing out here

For example, the following code, if patched as is, would result in an extra slash.

con = Faraday.new(url: "http://example.com/v1")
p con.build_exclusive_url("/search")
# ==> <URI::HTTP http://example.com/v1//search>

It seems to me that we could simply remove the leading slash from the non-base url, but I'm concerned that there may be use cases that I'm not recognizing.

ykrods commented 3 weeks ago

Sorry for changing my mind, but I came up with another, better idea.

As mentioned in #1237, the point of problem is caused by string before the colon (like service below) being parsed as a uri scheme when a url is converted to a URI in base + url.

p URI.parse("service:search").scheme  # ==> "service"

URI.parse is converting uri according to RFC3986 (RFC2396) so it is normal behaviour.

Here, quoting from rfc3986 4.2, is

A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative- path reference.

and url can be parsed as a relative path reference by prepending a dot segment as noted above.

base = URI.parse("http://example.com/v1/")
p base + "service:search"    # ==> #<URI::Generic service:search>
p base + "./service:search"  # ==> #<URI::HTTP http://example.com/v1/service:search>

Applying this to the current build_exclusive_url would result in the following code.

-     url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
+     url = "./#{url}" if url.respond_to?(:start_with?) && !url.start_with?("http://", "https://", "/", "./")
      uri = url ? base + url : base

This seems to be a smaller change than string joins.

references

iMacTia commented 3 weeks ago

Thank you for pointing that out! You're right, url could be a full address starting with https://, which would make the proposed solution break.

I agree with your take that it's probably best to keep base as a URI to preserve as much behaviour as possible, so your new proposed solution makes sense 👍

Could you re-run the benchmark I shared above just to make sure this is also in line (or faster) than the current implementation? And if so, would you be interested in giving it a stub on a new PR 🙂?

ykrods commented 2 weeks ago

@iMacTia

Thanks for your response!

Below are the benchmark results:

require "uri"
require "benchmark"

base = URI.parse("https://example.com/")

def t1(base, url)
  url = url.to_s.gsub(":", "%3A") if URI.parse(url.to_s).opaque
  uri = url ? base + url : base
end

def t2(base, url)
  url = "./#{url}" if url.respond_to?(:start_with?) && !url.start_with?("http://", "https://", "/", "./")
  uri = url ? base + url : base
end

Benchmark.bm do |x|
  p "case 1: with colon"
  x.report { 10_000.times { t1 base, "service:search" } }
  x.report { 10_000.times { t2 base, "service:search" } }

  p "case 2: without colon"
  x.report { 10_000.times { t1 base, "service/search" } }
  x.report { 10_000.times { t2 base, "service/search" } }
end

output:

       user     system      total        real
"case 1: with colon"
   0.125153   0.000436   0.125589 (  0.125913)
   0.071536   0.000000   0.071536 (  0.071653)
"case 2: without colon"
   0.100534   0.000157   0.100691 (  0.100862)
   0.073407   0.000057   0.073464 (  0.073600)

And I create a new PR: https://github.com/lostisland/faraday/pull/1569

iMacTia commented 2 weeks ago

The new solution is faster in both cases, this is great 🎉 !