theodi / open-data-certificate

The mark of quality and trust for open data
https://certificates.theodi.org/
MIT License
46 stars 39 forks source link

"Check URL" fails #1695

Closed olivierthereaux closed 5 years ago

olivierthereaux commented 5 years ago

Summary

Automated URL checks are failing, creating a significant barrier to use of the tool.

How to reproduce

  1. Create new certificate
  2. Enter a (valid) URL

Expected Behaviour: the URL is checked automatically and some metadata is extracted if possible

Current behaviour: the "Check URL" button shows an exclamation mark and the UI offers a form field requesting a justification of why the URL is incorrect.

Screenshot 2019-04-11 10 16 27

Severity and related issues

This is problematic in two ways:

olivierthereaux commented 5 years ago

Early diagnosis: the asynchronous PUT call to start.json appears to return a 404. I doubt that is the expected behaviour.

Screenshot 2019-04-11 10 22 08

olivierthereaux commented 5 years ago

Further diagnosis shows that the issue is not with every URL, but only some. We suspected it may have something to do with the fact that the automated check is done by ODIbot, which identifies itself with the user-agent ODICertBot 1.0 (+https://certificates.theodi.org/).

Quick check showed that indeed this would cause some CDNs to block the bot.

As a quick fix, attempting to change the UA: https://github.com/theodi/open-data-certificate/commit/d64e6950cb8fc3421b29311721deb89af6f6499c

Ideally we would also improve the error message to not make it look like the problem is necessarily the user's fault.

rachelwilson commented 3 years ago

See also https://github.com/theodi/open-data-certificate/issues/1194