riboseinc / uri_format_validator

Validate URL for Rails
MIT License
3 stars 2 forks source link

Current way of defining validation constraints is insufficient #75

Closed skalee closed 6 years ago

skalee commented 6 years ago

Writing Cucumber-based documentation gave me broad view about what this gem provides and how it works. Unfortunately, I start to realize that our current way of defining validation constraints by specifying a hash of option is insufficient, and even adding more options won't help.

Consider two popular schemes: http (RFC 7230) and file (RFC 8089). Both are useful for URLs, and are commonly interchanged as both can point to a file (typically over the network or in file system, respectively). I can easily imagine that user may want to specify different constraints for each scheme. For example, http URIs may be limited to some whitelisted domains, whereas file URIs may be anything.

This can be achieved with current gem…

validates :url, uri_format: {scheme: "http", <some_http_specific_constraints>}, unless: is_file_uri?
validates :url, uri_format: {scheme: "file"}, if: is_file_uri?

def is_file_uri?
  …
end

…but it doesn't look good, requires defining additional method, and gets quite complicated when more schemes are involved.

I want to propose two ways of making it better.


The first way is to specify validation rules with accept/reject lists (whitelists/blacklists). Given two lists of constraints: Accept = [A1, A2, …, An] and Reject = [R1, R2, …, Rn], an URI is considered valid if this condition is satisfied: (fits(A1) OR fits(A2) OR … OR FITS(An)) AND NOT (fits(R1) OR fits(R2) OR … OR FITS(Rn)).

I suggest four ways of defining constraints (accept/reject lists elements):

  1. As hashes of options, somewhat similarly to how it worked before.
  2. As URI Templates (RFC 6570), which resemble regular expressions, but are designed specifically for URIs.
  3. As common regular expressions.
  4. By referencing predefined convenience constants (or symbols, which are shorter, not sure yet), e.g. UriFormatValidator::ANY_ABSOLUTE_HTTP_OR_HTTPS_URI (or :any_absolute_http_or_https_uri).

With accept/reject lists, the previous example can be rewritten as:

validates :url, uri_format: {
  accept: [{scheme: "http", <some_http_specific_constraints>}, {scheme: "file"}],
  reject: [],
}

or:

validates :url, uri_format: {
  accept: [{scheme: ["file", "http"]}],
  reject: [{scheme: "http", <inverted_some_http_specific_constraints>}],
}

or:

validates :url, uri_format: {
  accept: [%r{\A(file|http)://}],
  reject: [{scheme: "http", <inverted_some_http_specific_constraints>}],
}

The other way is to leverage & and | operators like RSpec does, for example:

validates :url, uri_format: {criteria: (
  UriFormatValidator::match_scheme("http") & UriFormatValidator::match_some_http_specific_constraints | UriFormatValidator::match_scheme("file")
)}

A module can be mixed in to make matchers accessible without writing UriFormatValidator, but I don't like the idea of namespace polluting:

include UriFormatValidator::Methods
validates :url, uri_format: {criteria: (match_scheme("http") & match_some_http_specific_constraints | match_scheme("file"))}

I want to consult these two ideas. IMHO accept/reject lists are simpler and enough in most cases, and for the most complex criteria a custom model validator should be written anyway.

Whatever is chosen, I'll most likely rewrite the validation logic from scratch.

ribose-jeffreylau commented 6 years ago

RFC 6570 looks quite interesting!

Though, I agree that for the moment, simpler solutions like accept/reject lists should suffice.

ronaldtse commented 6 years ago

Agree with using accept/reject lists rather than operator overdose 😉

RFC 6570 looks promising indeed. @skalee please proceed with your excellent choice!

skalee commented 6 years ago

Accept and reject lists have been already implemented. Regarding URI templates support, #102 feature request has been created.