Open ribose-jeffreylau opened 7 years ago
@ribose-jeffreylau @wacko to verify if a url is reachable, ping a domain and what-not - all this will require sending http/icmp request which can take time. Thus it should be sent asyncronously. We could send a request syncronously and receive a reply quickly, but it's not reliable, sometimes it could fail, take a long time... How would it work together with the normal validation which works syncronously?
@GildedHonour you are right. resolvable
is just a DNS request which should be relatively fast, but ICMP/HTTP should be asynchronous. We should probably drop them as validation requirements.
On the other hand, I found this gem https://github.com/weppos/publicsuffix-ruby that uses Mozilla's PublicSuffix for validating top-level domains. We could leverage off it too.
Yeah, but a network request isn't reliable by definition. I'll do a research, maybe there's a way. PublicSuffix may be what we need, if it contains all possible ltd's we'll use it, I'll do a research too.
Yes you're right. Maybe this should all be done asynchronously. However async validations will require some kind of background processing framework... Let us know what your research tells us 👍
1) https://publicsuffix.org doesn't get updated immediately, thus when a new tld comes out, it doesn't get added to publicsuffix right away. It'll be safer to use this https://data.iana.org/TLD/tlds-alpha-by-domain.txt However, why would we need to validate only the "ltd" part for existence? We already have a regex/grammar in the gem for validating a whole URI/IRI which includes the ltd part.
2) Yes, a background validation will require Sidekiq or the like, which is an additional dependency and heavy. And we'll have to create a new column, probably, where the status of a validation of each "url" will be stored. Too much overhead, what do you think?
3) Thus validating it synchronously might be better. Yes, it might fail, take some time which isn't good for user experience but on some webservices there's a validation of this kind as well.
4)
# domain name resolvable
validates :my_url, url: {resolvable: true}
# host is reachable
validates :my_url, url: {reachable: true}
# full URL document returns 2xx HTTP status code
validates :my_url, url: {retrievable: true}
They look the same for me. What's the difference between the 1st and 2nd? And when would a user want to use the 3rd instead of the 1st or 2nd?
@GildedHonour
retrievable
ensures that at the time of validation, the URL is a "working" one, i.e. not a broken link and whatnot.@ribose-jeffreylau 1-3 Ok.
Resolvable means that the domain name resolves to an IP address; reachable means the resolved IP address can actually responds, for example with a TCP handshake (or an ACK). Retrievable goes further that it ensures it is a correct HTTP response.
I'll implement all 3. However, I'd leave only the 2nd one -- a host exists/url is reachable and responds, not necessarily with 2xx.
Sure. Thanks @GildedHonour !
Added this:
validates :my_url, url: {resolvability: level}
Where level can be :resorvable, :reachable or :retrievable
@GildedHonour that's great! Could you start a PR to merge that into master? I noticed it's pushed to develop
now. It would be better if you separate it into a topic branch so the changes don't get lost in develop
.
@ronaldtse yup, will do.
@ronaldtse What's the status of this issue? I see some related logic, documentation is missing for sure… I guess some final touches are required and it's done, isn't it? https://github.com/riboseinc/uri_format_validator/blob/5357f6aa52192cb0f7f74e62e1836e0d6acd465f/lib/uri_format_validator/validators/uri_format_validator.rb#L73-L82
Also tests are missing.
This feature is certainly not done yet, and it's certainly buggy. I'm adding a TODO list to the topmost comment.
@ronaldtse Got a question: how the resolvable
option is going to work when the hostname is an IP address? Like in http://127.0.0.1:80/something
?
@skalee I think IP addresses should be immediately resolvable, hence it should immediately pass the resolvability validation.
And what about URIs and URLs with unspecified host, like file:///dev/null
or urn:ISSN:0167-6423
? Should they pass host reachability check or not?
The intention of "host reachability" is that the path is actually reachable. So the ones with no host are assumed to pass "host reachability" since the path is already reachable without a host.
Hmm, I got your point, but after some thinking I leaning towards opinion that doing it this way will be bit unclear. And since we're talking about host here, it should be merged with host constraints somehow. Following roughly describes what I'm thinking about:
host: ["example.com", "example.test"]
host: :localhost
host: :reachable
host: [:resolvable, nil]
I'll hold on with merging the domain resolvability checks in.
To add an option to test for the given URL to see if it’s reachable, or if the domain is resolvable.
Also see #58 and #59.