ruby / uri

URI is a module providing classes to handle Uniform Resource Identifiers
https://ruby.github.io/uri/
Other
79 stars 42 forks source link

URI::regexp schemes are case sensitive #38

Open nws-td opened 2 years ago

nws-td commented 2 years ago

https://github.com/ruby/uri/blob/bc47bf71df2b2e9cea09d0b2684ceac7355e42a0/lib/uri/rfc2396_parser.rb#L266 This accepts an array of schemes, but the output is case sensitive, whereas the RFCs specify case insensitivity.

irb(main):055:0> URI::regexp(["http"]).match("HTTP://WWW.GOOGLE.COM")
=> nil
irb(main):056:0> URI::regexp(["HTTP"]).match("HTTP://WWW.GOOGLE.COM")
=> #<MatchData "HTTP://WWW.GOOGLE.COM" 1:"HTTP" 2:nil 3:nil 4:"WWW.GOOGLE.COM" 5:nil 6:nil 7:nil 8:nil 9:nil>

RFC2396:

  1. URI Normalization and Equivalence

    In many cases, different URI strings may actually identify the identical resource. For example, the host names used in URL are actually case insensitive, and the URL http://www.XEROX.com is equivalent to http://www.xerox.com. In general, the rules for equivalence and definition of a normal form, if any, are scheme dependent. When a scheme uses elements of the common syntax, it will also use the common syntax equivalence rules, namely that the scheme and hostname are case insensitive and a URL with an explicit ":port", where the port is the default for the scheme, is equivalent to one where the port is elided.

This is also the case in RFC3986 as well:

Although schemes are case-insensitive, the canonical form is lowercase and documents thatspecify schemes must do so with lowercase letters. An implementation should accept uppercase letters as equivalent to lowercase in scheme names (e.g., allow "HTTP" as well as "http") for the sake of robustness but should only produce lowercase scheme names for consistency.

Expected behavior would be that the scheme's casing is ignored. Similar to:

irb(main):008:0> URI("HTTP://WWW.GOOGLE.COM").scheme
=> "http"

I'm guessing that the regexp just needs to have the i flag passed to it like: /(?=#{Regexp.union(*schemes)}:)#{@pattern[:X_ABS_URI]}/xi

Is this a bug or am I misunderstanding the code? Thanks!