ruby / uri

URI is a module providing classes to handle Uniform Resource Identifiers
https://ruby.github.io/uri/
Other
81 stars 43 forks source link

Should `URI` support usernames with the `@` character #42

Closed deivid-rodriguez closed 2 years ago

deivid-rodriguez commented 2 years ago

We received the following report in rubygems: https://github.com/rubygems/rubygems/issues/5360.

Apparently email addresses as user names are common in JFrog artifactory sources.

Should URI support this?

I checked the addressable library and it has supported it since its first revision. However, I read the relevant part of the RFC, and I don't think this is allowed. I think addressable supports this instead of strictly adhering to the RFC because it's useful.

If this is accepted, I'm happy to create a PR.

deivid-rodriguez commented 2 years ago

Alternatively, it could be supported with a warning telling people to percent-encode it?

iarc13 commented 2 years ago

@deivid-rodriguez do you mean we can still use emails by encoding them using percent as a workaround ?

deivid-rodriguez commented 2 years ago

I think, yes, but I didn't try it. Can you try it?

deivid-rodriguez commented 2 years ago

It seems that according to the more up to date URL standard this is allowed, although including user:password information in urls at all should be considered a validation error and thus somewhat deprecated.

In my opinion it makes sense to support this and should be an easy patch.

duerst commented 2 years ago

RFC 3986 (and RFC 3987) clearly allow percent-encoded '@' (the encoding is '%40') in the userinfo component. A library such as URI has to decide how to expose this to its users. It may allow the use of a literal '@' in arguments and return values that cover only the userinfo component. It shouldn't allow a literal '@' in arguments or return values that span more components.

Looking at the 'URL standard', at https://url.spec.whatwg.org/#userinfo-percent-encode-set it says: "The userinfo percent-encode set is the path percent-encode set and U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+0040 (@), U+005B ([) to U+005E (^), inclusive, and U+007C (|)." The only way I can read this is "percent-encode '@'". We would need other evidence from the 'URL standard' that shows that not percent-encoding '@' in the userinfo component is allowed.

deivid-rodriguez commented 2 years ago

Hi @duerst, you're totally right, I misread this new URL spec, and after a second look I read it the same way you do.

So definitely "comply with the standard" is not a good reason to add this!

I still prefer how the addressable gem works, because it's what users expect, and these are realworld URIs, whatever the standard says.

duerst commented 2 years ago

@deivid-rodriguez: As far as I understand, the reason that the various specs don't allow this is that a lot of 'parsers' just look for the first @, in which case they would identify the wrong part as the userinfo component. Also, when allowing it, you essentially get something like user@hostname1@hostname2, which is really difficult to understand. What are the two hostnames for? Is the user at hostname1 or at hostname2? And then what about user@hostname1@hostname2@hostname3 (and so on)?

deivid-rodriguez commented 2 years ago

I think there's no ambiguity as long as the host component is not allowed to include "@": The "@" used to split the authority component should be the last one, at least that's how addressable works

irb(main):005:0> Addressable::URI.parse("https://user@hostname1@hostname2").user
=> "user@hostname1"
irb(main):006:0> Addressable::URI.parse("https://user@hostname1@hostname2@hostname3").user
=> "user@hostname1@hostname2"

But I understand how it can be confusing, and I'm fine with this not being implemented, so I'll just close this.

Thanks for the feedback @duerst!