Open mrblippy opened 5 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This is fundamentally due to oEQ having it's own URL validator in the URLCheckerService
- isURL()
: https://github.com/openequella/openEQUELLA/blob/develop/Source/Plugins/Core/com.equella.core/src/com/tle/core/url/URLCheckerService.java#L389
It has been lost in history as to why this was done, but I propose we simply remove it and utilise standard java.net.URL
to validate it. e.g.
String url = "https://smallbusiness.chron.com/put-symbol-url-48344.html#heading1";
new URL(url);
At which point if the URL is not valid, then a MalformedURLException
exception is thrown which will also supply much more meaningful messaging to be stored.
Note: The isURL()
method is only called from that one place - so it is probably best to remove and rework the call site to use the above and handle that exception rather than the current generic IllegalArgumentException
.
@edalex-ian it could be because Url constructor would allow arbitrary protocols, or it wasn't good enough for whatever else reason. It should be straightforward to add a hash component to the regex?
hmmm, that could be the reasoning. But I don't see great value in maintaining our own just for the sake of that - especially when compared to the better error reporting from URL (which even includes checking for known protocols - or we can add a check to simply ensuring we're only http[s]).
I guess it should be easy enough to revert if we remember the reason :)
Is there a way to disable bad url checker. We are getting some bad url notifications even though url works fine in the browser.
Describe the bug The url checking results can be seen in the referencedurl table. Here is an example db entry
The # character is valid character to use in a url, so we shouldn't be marking urls containing them as bad.
Sounds like it could be an encoding issue or something like that
To Reproduce Steps to reproduce the behavior:
Expected behavior Urls that contain # characters should be marked as good, provided they respond with a valid http response code
Platform: