whois-server-list / public-suffix-list

Java API for Mozilla's Public Suffix List
Do What The F*ck You Want To Public License
46 stars 14 forks source link

getRegistrableDomain has unexpected behavior when given URLs with trailing paths #8

Open miloprice opened 8 years ago

miloprice commented 8 years ago

If you give it "example.com/path", or "example.com/", for example, the trailing / and anything after it are left in as part of the registrable domain. This actively causes problems if you send in a path directly to a filename with an extension (e.g., "example.com/path/image.jpg") because the method will interpret "com/path/image" as the domain name portion (and ".jpg" as the tld).

malkusch commented 8 years ago

Thank you for the report. Before I take action I need to understand what disallowed_STD3_valid means.

malkusch commented 8 years ago

Ok, now I see. / was valid in IDNA2003, but is now with IDNA2008 no more valid. I'll take care of that.

HatScripts commented 8 years ago

Still having this issue on 2.2.0.

In addition, getRegistrableDomain returns the same string as the input URL when said URL begins with "https://" (Though not for URLs with subdomains);

![ Input Output Valid
https://www.github.com google.com
https://www.example.com example.com
https://www.google.com google.com
https://subdomain.example.com example.com
https://github.com https://github.com
https://example.com https://example.com
https://google.com https://google.com ](http://i.imgur.com/krEQe7h.png)