Closed ykrods closed 5 months ago
Hi @ykrods and thank you for the thorough investigation! The escaping actually came out during the review of #1237, although at the time we didn’t find any case where this could cause a problem. Now that you’ve brought one, I really think we should get this fixed.
The underlying issue was described in this comment, so I want to make sure we don’t fall back into that. To check your proposed solution, I’ve run the following:
base = URI.parse('https://service.com/')
url = 'service:search?limit=50&offset=400'
uri = url ? URI.parse(base.to_s + url) : base # this is your proposed solution, which works!
=> #<URI::HTTPS https://service.com/service:search?limit=50&offset=400>
Thank you also for pointing out pros/cons of the proposed solution. Looking at the disadvantages:
Benchmark.bm do |x|
x.report do
10_000.times do
url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
uri = url ? base + url : base
end
end
x.report { 10_000.times { URI.parse(base.to_s + url) } }
end
user system total real
0.184706 0.021542 0.206248 ( 0.206335)
0.040563 0.001414 0.041977 ( 0.041988)
@iMacTia
- I really don’t think this is gonna be a problem. Running the following benchmark actually shows the proposed solutions to be much faster, actually!
Thanks for benchmarking and very good news!
- If the base already contains query params or hash, then there should be no URL?
I know this might not be a common use case, but I concerned about like a below case.
base = URI.parse("https://example.com/search?limit=100")
url = "additional_path"
p URI.parse(base.to_s + url) # ==> <URI::HTTPS https://example.com/search?limit=100additional_path>
However, I found query is removed in url_prefix= method, so it's disadvantage has been resolved.
- Could you articulate more on this one? I’m not sure I understand which case you’re pointing out here
For example, the following code, if patched as is, would result in an extra slash.
con = Faraday.new(url: "http://example.com/v1")
p con.build_exclusive_url("/search")
# ==> <URI::HTTP http://example.com/v1//search>
It seems to me that we could simply remove the leading slash from the non-base url, but I'm concerned that there may be use cases that I'm not recognizing.
Sorry for changing my mind, but I came up with another, better idea.
As mentioned in #1237, the point of problem is caused by string before the colon (like service
below) being parsed as a uri scheme when a url is converted to a URI in base + url
.
p URI.parse("service:search").scheme # ==> "service"
URI.parse
is converting uri according to RFC3986 (RFC2396) so it is normal behaviour.
Here, quoting from rfc3986 4.2, is
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative- path reference.
and url can be parsed as a relative path reference by prepending a dot segment as noted above.
base = URI.parse("http://example.com/v1/")
p base + "service:search" # ==> #<URI::Generic service:search>
p base + "./service:search" # ==> #<URI::HTTP http://example.com/v1/service:search>
Applying this to the current build_exclusive_url
would result in the following code.
- url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
+ url = "./#{url}" if url.respond_to?(:start_with?) && !url.start_with?("http://", "https://", "/", "./")
uri = url ? base + url : base
This seems to be a smaller change than string joins.
Thank you for pointing that out!
You're right, url
could be a full address starting with https://
, which would make the proposed solution break.
I agree with your take that it's probably best to keep base
as a URI
to preserve as much behaviour as possible, so your new proposed solution makes sense 👍
Could you re-run the benchmark I shared above just to make sure this is also in line (or faster) than the current implementation? And if so, would you be interested in giving it a stub on a new PR 🙂?
@iMacTia
Thanks for your response!
Below are the benchmark results:
require "uri"
require "benchmark"
base = URI.parse("https://example.com/")
def t1(base, url)
url = url.to_s.gsub(":", "%3A") if URI.parse(url.to_s).opaque
uri = url ? base + url : base
end
def t2(base, url)
url = "./#{url}" if url.respond_to?(:start_with?) && !url.start_with?("http://", "https://", "/", "./")
uri = url ? base + url : base
end
Benchmark.bm do |x|
p "case 1: with colon"
x.report { 10_000.times { t1 base, "service:search" } }
x.report { 10_000.times { t2 base, "service:search" } }
p "case 2: without colon"
x.report { 10_000.times { t1 base, "service/search" } }
x.report { 10_000.times { t2 base, "service/search" } }
end
output:
user system total real
"case 1: with colon"
0.125153 0.000436 0.125589 ( 0.125913)
0.071536 0.000000 0.071536 ( 0.071653)
"case 2: without colon"
0.100534 0.000157 0.100691 ( 0.100862)
0.073407 0.000057 0.073464 ( 0.073600)
And I create a new PR: https://github.com/lostisland/faraday/pull/1569
The new solution is faster in both cases, this is great 🎉 !
Hi and thank you very much for this project at first.
Basic Info
Issue description
By the #1237 improvements, colons in path are now encoded as %3A. However, it is up to the server implementation to accept encoded colons, and in fact Firebase's fcm v1 api responds 404 status for encoded
messages%3Asend
.https://firebase.google.com/docs/reference/fcm/rest/v1/projects.messages/send
Currently, encoding occurs only in limited cases like a below reproduction code.
And no error occurs in the following situations.
Faraday.initialize()
orpost()
methodpost()
(like:client.post("/v1/projects/#{project_id}/messages:send")
)Steps to reproduce
(using ENV for secret values)
Execution log
My thoughts
https://github.com/lostisland/faraday/blob/4abafa5c66ce75bc2abbf0a5e7e98137dea91eb2/lib/faraday/connection.rb#L470-L483
How about replace URL + string merge to string + string merge
This has the following advantages and disadvantages.
Advantages.
Disadvantages.