Closed mojomonger closed 1 year ago
Yet another reason to just scrap the whole thing and make a wrapper for the two others 🤷♂️
malformed_url
should be equal to true, which it is not. This url is also a real 200, but IARI is returning 0:
https://archive.org/search.php?query=%22easter%20island%22&and%5b%5d=mediatype%3A%22texts%22
Could it be timeout related? Did you try to increase the timeout? The default is very short, 2 sek if I remember correctly
Could it be timeout related? Did you try increase the timeout? Tried https://archive.org/services/context/iari/v2/check-url?url=http://www.uri.edu/artsci/ecn/starkey/ECN398%20-Ecology,%20Economy,%20Society/RAPANUI.pdf&refresh=true&timeout=60
And Got a response immediately so this really is s bug.
Yes, it is :) @dpriskorn What module/source file do you think this is in, where it is sending back a status code of 0?
I investigated. This is caused by the validation returning false. -> is_valid: False
We only check status codes on valid urls according to our URL checker (which seem buggy and no longer needed IMO). See https://github.com/internetarchive/iari/blob/main/src/models/wikimedia/wikipedia/url.py#L160 This is thus expected behavior following the current design, so I'm closing this as it is not a bug.
If you want me to remove the url validation code and just relay whatever the user is sending to the endpoint to testdeadlink, please open a new issue.
using the check-url endpoint for the url "http://www.uri.edu/artsci/ecn/starkey/ECN398 -Ecology, Economy, Society/RAPANUI.pdf", the status_code and the testdeadlink_status_code fields are set to 0.
Could this be because of the spaces in the URL?
The IABOT and CORENTIN methods both correctly return 404
IARI
https://archive.org/services/context/iari/v2/check-url?url=http://www.uri.edu/artsci/ecn/starkey/ECN398%20-Ecology,%20Economy,%20Society/RAPANUI.pdf&refresh=true
IABOT:
CORENTIN: