internetarchive / wayback-machine-webextension

A web browser extension for Chrome, Firefox, Edge, and Safari 14.
GNU Affero General Public License v3.0
647 stars 207 forks source link

503 error with Washington Post page #967

Open newsjunkie247 opened 1 year ago

newsjunkie247 commented 1 year ago

Noticed some general slowness/bugginess with save page now/autosave over the past week. It seems to have gotten somewhat better (though Twitter.com could still be a bit faster), but I'm still seeing errors with first-time-saves of the Washington Post website in particular for some reason, where it returns a 503 error with first-time page saves, though the save does actually eventually seem to go through in most cases though maybe not always.

cgorringe commented 1 year ago

Sometimes the servers are too busy, causing slowdowns. Will test with Washington Post and Twitter. Thanks!

newsjunkie247 commented 1 year ago

definitely still seeing this issue with first-time saves of Washington Post. With the extension it just seems to fail, but if you do it manually you for example get the notification "Service Unavailable for https://www.washingtonpost.com/politics/2022/10/16/trump-jews-israel/ (HTTP status=503)." through the save does actually seem to go through later.

markjohngraham commented 1 year ago

Washington Post is a paywalled site.

As such archiving it can be problematic.

This is more a reality of the site and the Wayback Machine than the extension.

Wish there was more I could say/do!

newsjunkie247 commented 1 year ago

But if it was a paywall issue, shouldn't it still scrape and show it as blocked or with a paywall notice? In this case the error notice appears with no attempt at scraping.

newsjunkie247 commented 1 year ago

And saves do still seem to go through eventually within a short time and show up perfectly fine. Here's one from Friday that has 11 saves and appears fine: https://web.archive.org/web/20221016170759/https://www.washingtonpost.com/technology/interactive/2022/tiktok-popularity/ https://www.washingtonpost.com/technology/interactive/2022/tiktok-popularity/

newsjunkie247 commented 1 year ago

And another one from this week with 16 saves: https://www.washingtonpost.com/national-security/2022/10/14/trump-knew-he-lost-jan-6/ And these are not Covid articles or something where they may have dropped the paywall. https://web.archive.org/web/20221015230423/https://www.washingtonpost.com/national-security/2022/10/14/trump-knew-he-lost-jan-6/

newsjunkie247 commented 1 year ago

And when I initiate a new save of those existing pages, it's also successful, which also I don't think would work if it was a systematic paywall issue.

newsjunkie247 commented 1 year ago

And this article for which I got the error in question just an hour ago, does not show as being saved in the extension (yet?), but if you search it actually does show that the save went through and no paywall notice anywhere : https://web.archive.org/web/20220000000000*/https://www.washingtonpost.com/politics/2022/10/16/trump-jews-israel/ https://web.archive.org/web/20221017003128/https://www.washingtonpost.com/politics/2022/10/16/trump-jews-israel/

markjohngraham commented 1 year ago

Thank you.

And, what Carl said above is correct

"Sometimes the servers are too busy, causing slowdowns"

newsjunkie247 commented 1 year ago

For Twitter I do get that message that it will take X number of minutes or whatever because of concurrent ssaves. This seems to be some kind of other issue where it claims the save has failed even when it actually goes through or something.

markjohngraham commented 1 year ago

Ok