ruby / uri

URI is a module providing classes to handle Uniform Resource Identifiers
https://ruby.github.io/uri/
Other
85 stars 46 forks source link

Potentially inconsistent behavior with how URIs with wildcard subdomains and ports are handled #44

Open JessRudder opened 2 years ago

JessRudder commented 2 years ago

I've noticed what appears to be inconsistent behavior with URI(my_uri). I tried reading the RFCs that URI is based on to understand if this was by design (but I couldn't parse the RFC all that well).

If I run URI("*.example.com:5555") I get the following error:

URI::InvalidURIError: bad URI(is not URI?): "*.example.com:5555"
from /Users/jessrudder/.rbenv/versions/3.1.2/lib/ruby/3.1.0/uri/rfc3986_parser.rb:67:in `split'

If I keep the wildcard and the port but add a scheme, I get the behavior I expected URI("http://*.example.com:5555")

scheme: "http"
userinfo: nil
host: "*.example.com"
port: 5555
path: ""

If I don't have the wildcard domain or a scheme, it appears to work but not all methods respond as expected URI("subdomain.example.com:5555")

scheme: "subdomain.example.com"
user_info: nil
host: nil
port: nil
path: nil

If I remove the port and don't have a scheme, it appears to work but not all methods respond as expected URI("*.example.com")

scheme: nil
userinfo: nil
host: nil
port: nil
path: "*.example.com"

I'd be happy to try to work on a PR but wanted to confirm that this behavior was incorrect before I did that. Thanks!

duerst commented 2 years ago

I've noticed what appears to be inconsistent behavior with URI(my_uri). I tried reading the RFCs that URI is based on to understand if this was by design (but I couldn't parse the RFC all that well).

Please have a look at https://www.rfc-editor.org/rfc/rfc3986#appendix-A. The starting production you need is URI-reference, which includes relative URIs.

If I run URI("*.example.com:5555") I get the following error:

URI::InvalidURIError: bad URI(is not URI?): "*.example.com:5555"
from /Users/jessrudder/.rbenv/versions/3.1.2/lib/ruby/3.1.0/uri/rfc3986_parser.rb:67:in `split'

The first colon is the delimiter between the scheme and the rest, and "*.example.com" isn't a valid scheme, so this is expected.

If I keep the wildcard and the port but add a scheme, I get the behavior I expected URI("http://*.example.com:5555")

scheme: "http"
userinfo: nil
host: "*.example.com"
port: 5555
path: ""

Of course.

If I don't have the wildcard domain or a scheme, it appears to work but not all methods respond as expected URI("subdomain.example.com:5555")

scheme: "subdomain.example.com"
user_info: nil
host: nil
port: nil
path: nil

Again, the part before the first colon is the scheme, but now "subdomain.example.com" is okay as a scheme. Periods are allowed in schemes, but this scheme doesn't exist. The library could use generic syntax for the rest (which would then put 5555 into path, but the library probably uses scheme-specific code for the rest of the URI, which doesn't exist. That's why I guess the rest of the fields are empty.

If I remove the port and don't have a scheme, it appears to work but not all methods respond as expected URI("*.example.com")

scheme: nil
userinfo: nil
host: nil
port: nil
path: "*.example.com"

Relative URIs are first and foremost for navigating inside a single site. That's why a bare "*.example.com" is interpreted as a path component, not as a host.

I'd be happy to try to work on a PR but wanted to confirm that this behavior was incorrect before I did that. Thanks!

Well, to me the behavior looks correct. To most humans, things such as "subdomain.example.com" smell strongly of domain names, but they could be a scheme or a path component, too.