weppos / publicsuffix-ruby

Domain name parser for Ruby based on the Public Suffix List.
https://simonecarletti.com/code/publicsuffix
MIT License
616 stars 109 forks source link

this gem should not be used for validating domain names #183

Closed gsar closed 3 years ago

gsar commented 3 years ago

README is suggestive of using this gem for validating domains, however it is severely broken for that purpose, as almost any character is allowed in labels. Valid labels in domains should only contain characters in [\-A-Za-z0-9] and are limited to 63 characters. I suggest adding a warning in the README about this.

image

weppos commented 3 years ago

The purpose of this library is to validate a name according to the PSL rules. It's not a library to validate URIs or performs syntax validations of strings.

gsar commented 3 years ago

@weppos I appreciate what you are saying there, but the README says this gem is a "domain name parser" (without further qualification), when it most certainly is not. It can only be used to check if a given arbitrary string ends in a valid TLD listed in the PSL, which is not the same thing as parsing a domain. As you can see above, it accepts a newline in the string and the domain method fails to parse the components of what constitutes a valid domain properly.

I would suggest it should either reject invalid input or parse out the part (jwerwe123.com in the above) that is still considered a valid domain. Silently producing invalid output when given invalid input is neither here nor there.

Said another way, a more appropriate name for what the valid? method does would be ends_with_valid_tld?.

If you are not willing to make any changes, I suggest at least mentioning that the inputs to this gem or the output of the domain, subdomain etc methods should be further validated by some other means. I'm happy to submit a PR if you agree.

image

weppos commented 3 years ago

@weppos I appreciate what you are saying there, but the README says this gem is a "domain name parser" (without further qualification)

This is incorrect. It says "is a Ruby domain name parser based on the Public Suffix List". That makes a lot of difference. I also don't write anywhere that this is a domain validator.

I would suggest it should either reject invalid input or parse out the part (jwerwe123.com in the above) that is still considered a valid domain. Silently producing invalid output when given invalid input is neither here nor there.

By design, I decided to not introduce this feature because it would require a non insignificant maintenance effort. There are just plenty of possible ways you can submit invalid string data.

This library expect a formally valid domain name. It is user responsibility to feed a correctly formatted data. If you want an URI validator, I recommend you look into https://www.rubydoc.info/gems/addressable/Addressable

If you are not willing to make any changes, I suggest at least mentioning that the inputs to this gem or the output of the domain, subdomain etc methods should be further validated by some other means. I'm happy to submit a PR if you agree.

I will look into that and see how I can make it even more explicit.

However, please note the documentation already exists. The library is extensively documented, and each method has its own docblock. Here's an example of the parameter explanation for both valid and parse:

  # @param  [String, #to_s] name The domain name or fully qualified domain name to validate.

Note it explicitly references a domain name or FQDN. It doesn't say "arbitrary string that may contain or look like a domain". I understand it may be a subtle difference to grasp, but these terms carry on a pretty well defined specification in the corresponding RFCs. I don't intend to package such RFCs baggage into this library, as it's beyond the scope of a PSL implementation.