Support options for `reserved:`

ribose-jeffreylau commented 7 years ago

Reserved

.test
.example
.invalid
.localhost

http://www.faqs.org/rfcs/rfc2606.html

Examples

  # allow reserved authorities to be entered
  validates :my_url, url: {reserved: true}

  # disallow reserved authorities to be entered
  validates :my_url, url: {reserved: false}

wacko commented 7 years ago

if this option is not defined, what is the expected behavior?

wacko commented 7 years ago

Should we include example.com, example.net, and example.org in the list of reserved domains?

ronaldtse commented 7 years ago

Let me answer for @ribose-jeffreylau . The default should be no validation on the TLDs.

Maybe it should have been this instead:

validates :my_url, url: { authority: { reserved: true } }

Would be equal to:

TLD is one of .test, .example, .invalid, .localhost
Domain can be one of example.*

ronaldtse commented 7 years ago

Just to clarify, the intent of this validation is to prevent (or require) inputting of these reserved domains.

wacko commented 7 years ago

This syntax:

validates :my_url, url: { authority: { reserved: true } }

won't allow other kind of validations: (see #9)

validates :my_url, url: {authority: %r{(?:.+\.)*example.com}}

wacko commented 7 years ago

I understand that reserved: true will allow domains that end in .test or .example, but I'm not sure if example.* should be allowed always, or only when reserved is true

ronaldtse commented 7 years ago

@wacko do you have a good idea on how we can enable these options? Maybe we should just have a true/false so if false we don't allow any reserved TLDs, and when true allow reserved TLDs (in addition to normal TLDs).

ribose-jeffreylau commented 7 years ago

I just updated #9 's description to allow for a more composable syntax, which I think might help:

  # require authority format to match a regex as well as allow reserved domains
  validates :my_url, url: { authority: { reserved: true, match: %r{(?:.+\.)*example.com } }

  # require authority format to match a regex as well as disallow reserved domains
  validates :my_url, url: { authority: { reserved: false, match: %r{(?:.+\.)*example.com } }

or maybe even specify explicitly the options for tld and sld (second level domain):

  # require authority format to allow reserved tlds but disallow example.{com,net,org}
  validates :my_url, url: { authority: { reserved: { tld: true } } }  
  validates :my_url, url: { authority: { reserved: { sld: false, sld: true } } }
                                                      # ^ false is the default

  # require authority format to disallow reserved tlds but allow example.{com,net,org}
  validates :my_url, url: { authority: { reserved: { sld: true } } }
  validates :my_url, url: { authority: { reserved: { tld: false, sld: true } } }
                                                      # ^ false is the default

  # require authority format to disallow reserved tlds nor example.{com,net,org}
  validates :my_url, url: { authority: { reserved: false } }
  validates :my_url, url: { authority: { reserved: { tld: false, sld: false } } }

  # require authority format to allow both reserved tlds and example.{com,net,org}
  validates :my_url, url: { authority: { reserved: true } }
  validates :my_url, url: { authority: { reserved: { tld: true, sld: true } } }

What do you think?

wacko commented 7 years ago

I understand that urls have a wide range of options (user/pass, port, reserved domains, etc) And I also understand that this gem try to cover all these options. But I wonder myself how much sense does it it have trying to catch all the different possibilities. Why? I try to be pragmatic on how this gem could be used. I try to think different scenarios and which options make sense on each of these cases.

99% of the time what is needed is to check for a valid domain [1], or a standard url [2] [1] tld/sld host, with or without a scheme, like: github.com, www.github.com, or https://www.github.com. [2] deep link to a page/resource, like: https://github.com/riboseinc/url_validator/issues, https://github.com/search?utf8=%E2%9C%93&q=rails&type=

on this two cases, what we could need to validate is: 1) host (any host or from a list/regex) 2) scheme (any scheme, or from a list) 3) path/query/fragment (presence or absence)

no more than that.

on the other 1%, maybe we also want to allow a user/pass (for a FTP connection), a specific port (for SSH) or a relative route (to add links on a webpage) in these scenarios, a home made solutions could be better than a general one. For example, in a FTP you could ask for user/password on another set of fields, instead a unique one. If instead of URLs we had a newsletter and were asking form mails, I would check the domains to not be a reserved one (example.com), or from a disposable email. So I would need a set of excluded domains to check agains. But, again, this examples are border cases, that goes beyond the norm.

I think that the possibilities of someone needing this is almost zero:

  validates :my_url, url: { authority: { reserved: { sld: false, sld: true } } }

IMHO, a simpler API would be a better solution.

ronaldtse commented 7 years ago

I do agree with @wacko that the case is quite niche, but home made solutions most probably break in those cases, like the FTP username/password one you mentioned.

Let's just have reserved: true | false here, and if the need comes in the future, we can further expand based on these discussions.

Maybe we should expose the URI matching methods separately from the Rails validation since this is quite good.

wacko commented 7 years ago

To sum up... 1) reserved: true will allow any domain, and reserved: false will reject some domains, like *.test AND example.*. 2) reserved: true will be the default behavior

example: validates :my_url, url: true will allow www.example.com and ribose.test validates :my_url, url: { authority: { reserved: false } } will reject them

Correct me if something is wrong...

ribose-jeffreylau commented 7 years ago

Hi @wacko , good points!

riboseinc / uri_format_validator

Support options for `reserved:` #13

Reserved

Examples