Open mojomonger opened 1 year ago
I suggest we dedicate time to check-urls instead and add testing of the cache to that new endpoint instead.
Then when merged we deprecate the check-url endpoint and close all stories related to it like this one.
i do not agree. i think we should keep the check-url (singular) endpoint. If, when you do implement the check-urls endpoint (plural), you can utilize the same code internally when only one url is checked.
This, i think, is a better API, as people would sometimes only want to check 1 url, and, by fixing this bug NOW, it makes the demo version of our software, IARE, look OK and reliable. The way it is now, we get a horrible error message because the data is corrupt:
Right!
Please keep it.
On Jun 18, 2023, at 3:23 PM, mojomonger @.***> wrote:
i do not agree. i think we should keep the check-url (singular) endpoint. If, when you do implement the check-urls endpoint (plural), you can utilize the same code internally when only one url is checked.
This, i think, is a better API, as people would sometimes only want to check 1 url, and, by fixing this bug NOW, it makes the demo version of our software, IARE, look OK and reliable. The way it is now, we get a horrible error message because the data is corrupt: https://user-images.githubusercontent.com/550079/246694452-479ecc5b-98ed-4e1b-aefb-8d6541965612.png — Reply to this email directly, view it on GitHub https://github.com/internetarchive/iari/issues/875#issuecomment-1596289900, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADYBLMCHSBKCUTYUZCLVMGTXL55UJANCNFSM6AAAAAAZJDWFVY. You are receiving this because you are subscribed to this thread.
@dpriskorn - is there a way to test this in your debug environment?
i would try:
If the new cache file does not replace the old,existing cache file, then the bug lies within this logic
Also, the following:
When you run this /check-url, the "old, cached" version is returned:
Bit this gives a 500 error, when refresh=true is added:
That should help you in that you should be able to see why it is breaking on that URL.
Thanks for the examples. I'll look into it soon.
Please do. this is a very glaring example of IARE showing incorrect information based on IARI data.
This bug is related to url encoding and weird characters It cause a UnicodeEncodeError in the gunicorn worker so the content is never saved to disk.
Fixed now
Great! Glad it is fixed. Could you add a (short) description of what the fix entailed, and which modules were affected? Thanks.
Please see the commits in the PR linked and ask questions there if anything is unclear.
when i run the check-url endpoint with the following endpoint:
https://archive.org/services/context/iari/v2/check-url?url=https://web.archive.org/web/20170726234423/https://minnesotastreetproject.com/exhibitions/1275-minnesota-st/internet-archive%E2%80%99s-2017-artist-residence-exhibition
it does not have the "teastdeadlink_status_code" property in the returned results. This indicates that something is wrong with the caching process, as a previous fetch with /check-url was done with the "refresh=true" flag set.
when the check-url is run with refresh=true, a 500 error occurs:
https://archive.org/services/context/iari/v2/check-url?refresh=true&url=https://web.archive.org/web/20170726234423/https://minnesotastreetproject.com/exhibitions/1275-minnesota-st/internet-archive%E2%80%99s-2017-artist-residence-exhibition
returns:
Internal Server Error The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
It appears something is going wrong with the processing of this url when refresh=true