python-validators / validators

Python Data Validation for Humans™.
MIT License
977 stars 155 forks source link

Fails validation on URLs that are not FQDNs #217

Closed brianjmurrell closed 1 year ago

brianjmurrell commented 2 years ago

A URL without a FQDN is perfectly valid. The resolver may attempt to add labels to a non FQDN in order to fully qualify it allowing the user to specify non-FQDNs.

I.e. http://pc:8081/ is perfectly valid:

$ host pc
pc.example.com has address 10.75.22.1
$ curl -q -4 -v 'http://pc:8081/'
*   Trying 10.75.22.1:8081...
* Connected to pc (10.75.22.1) port 8081 (#0)
> GET / HTTP/1.1
> Host: pc:8081
> User-Agent: curl/7.82.0
> Accept: */*
> 

Because /etc/reslov.conf has search example.com in it.

Yet this fails validation with validators.url()

$ python -c 'import validators; print(validators.url("http://pc:8081/"))'
ValidationFailure(func=url, args={'value': 'http://pc:8081/', 'public': False})
miigotu commented 2 years ago

I don't think this is a bug in validators. It was just a hasty expectation that it would act similar to urlparse, but validators is using a complex set of checks and regexes that can't tell what's in your resolve.conf or hosts file or is set up as a docker endpoint for example. The fix is just for me to know how it works and use it correctly. https://github.com/SickChill/SickChill/commit/65a4c34186f8b3f93895ba6a622eb12a018ab2c3

brianjmurrell commented 2 years ago

I don't think this is a bug in validators.

I disagree.

It was just a hasty expectation that it would act similar to urlparse, but validators is using a complex set of checks and regexes that can't tell what's in your resolve.conf or hosts file

It shouldn't need to. It just needs to accept that unqualified host/domain names are perfectly valid and as such don't HAVE to have multiple labels in them and validate them as valid.

The fix is just for me to know how it works and use it correctly. SickChill/SickChill@65a4c34

While it seems that that would work-around the issue in validators by preempting it with urlparse, I don't think it changes the fact that a single label host/domainname is valid. It may or may not actually yield a resolvable IP address, but as a name, it's still valid, just like a FQDN may not resolve an IP address but validators will validate it as valid.

miigotu commented 2 years ago

Ok, but you always disagree. validators works just fine. Maybe you should write a replacement :roll_eyes:

brianjmurrell commented 2 years ago

Yes, I will raise an issue when an implementation has a bug in it -- hopefully to prevent others from getting bitten by it. But I guess you think that when a bug is found I should just be quiet about it and/or instead rewrite the whole project? That makes sense.

miigotu commented 2 years ago

It isn't a bug though. When a piece of software doesn't work how YOU want it to, but it is working exactly as the developer intended, it is not a bug.

validators.url is designed specifically for FQDN, using regexes, not checking if it is a named endpoint network link, or otherwise.

A bug would be, if the dev intended it to be a certain way, documented it a certain way, and it doesn't work that way. You're idea of what something SHOULD be doesn't determine what it was intended to be.

What you are trying to say, with really bad people skills, is that you wish it had a feature that made it work how you think it should work.

The library is meant to help weed out bad data, not to fit 100% of edge cases. If it allowed unqualified names to match without disabled-by-default parameterization it would make the library practically useless, because most bad urls that are mistakenly entered or make it somewhere look exactly like local net endpoints or docker endpoints.

Now, I fixed MY mistake in the code in SC. None of that anywhere meant that validators was broken. If you always think you're the smartest person in the room, you usually aren't.

brianjmurrell commented 2 years ago

It isn't a bug though.

It is though.

When a piece of software doesn't work how YOU want it to, but it is working exactly as the developer intended, it is not a bug.

Documentation (and therefore developer) says:

url validators.url.url(value, public=False)[source] Return whether or not given value is a valid URL.

A non FQDN is a valid URL. Chrome opens it. curl opens it. Any valid URL resolving tool will open it if the resolver is configured accordingly.

validators.url is designed specifically for FQDN,

Is it? The documentation for it doesn't even mention FQDN. Even if it did, it would not be a valid URL validator. It would be somebody's non-inclusive idea of what they think a URL is which would only include a subset of actual valid URLs.

A bug would be, if the dev intended it to be a certain way, documented it a certain way, and it doesn't work that way.

Again, the documentation (and therefor dev.) says:

Return whether or not given value is a valid URL. And that is it. There are no caveats about FQDNs. A non-FQDN is a valid URL. Nothing in the documentation says it needs to be FQDN. So according to the documentation, the dev intends for it to be a valid URL even if it's not FQDN. If the dev. intends to only include the subset of valid URLs that doesn't include FQDNs, the documentation should list that limitation.

The library is meant to help weed out bad data, not to fit 100% of edge cases.

A non-FQDN is hardly an edge-case, IMHO.

If it allowed unqualified names to match without disabled-by-default parameterization it would make the library practically useless, because most bad urls that are mistakenly entered or make it somewhere look exactly like local net endpoints or docker endpoints.

Validation is hard.

Now, I fixed MY mistake in the code in SC. None of that anywhere meant that validators was broken.

I guess we will just have to agree to disagree.

This project and it's maintainer are of course completely free to disagree with my bug report and close the ticket and do nothing about it. That is fine. My effort here was just to prevent future users of validators.url() to fall into the erroneous handling situation that I found myself in when using a non-FQDN URL. So go ahead and close this issue if you still feel it is invalid and I won't say anything more.

miigotu commented 2 years ago

A named endpoint is not a URL, Have fun here though, because you won't be irritating me on SC anymore..

By RFC 1738, the host in a URL is REQUIRED to be an IP or a FQDN, not an unqualified domain name. Not a local hostname. Not a netbios name, not a network resource link. Modern browsers just happen to ALSO work with resource links, local network links, etc, IN ADDITION to URLS.

Screenshot from 2022-08-30 23-36-55

Kaju-Bubanja commented 2 years ago

Sick burn :dagger: But I also wanted to contribute something useful. Namely https://datatracker.ietf.org/doc/html/draft-thomson-postel-was-wrong-00 @brianjmurrell might be coming from a https://datatracker.ietf.org/doc/html/rfc1122 philosophy but we over time found out that being loose might not be healthy for the ecosystem: https://datatracker.ietf.org/doc/html/draft-thomson-postel-was-wrong-00. Just thought these two rfcs are relevant to both of you.

miigotu commented 2 years ago

Exactly. If it was meant to be loose, the validator would be validator.host, which has a much more loose definition (and much more difficult to validate), rather than validators.url, which has a very strict definition and therefore a trivial path to codifying a validation more effectively. I don't consider url and host interchangeable, and that difference is not mere semantics.

I am not normally so brash, however I have a history of bumping heads with this specific OP. I appreciate people who are passionate about proving people wrong and pointing out mistakes, when they are correct. When they commonly are not, I am the opposite of appreciative. 😂