Closed ghost closed 1 year ago
Consider a mechanism or option to make an archival request API call to another service that does not exclude sites, like https://archive.ph/ or similar in these situations.
Information about the block list isn't provided by the Save Page Now service. In SPN there are some sites where archiving works fine and no error messages are shown, but the domain itself is actually blocked and you can't view the captures. For example, if you archive a Dropbox download link ending in ?dl=1
, SPN won't report any problems, and the capture of the URL that the submitted URL redirects to will be viewable, but certain parts of the main dropbox.com domain are blocked so the capture of the actual submitted URL won't be viewable. We could infer from this that the captures are actually kept in spite of not being visible, but I don't know for sure.
There is no archive.ph API, and the site maintainer (it's literally just one person running it) almost certainly doesn't plan to add one. Bot stuff is actively discouraged on that site and you can get a CAPTCHA if you submit more than a few links. (That site also does have a block list, and I don't think there are any large archival sites that wouldn't have a block list.)
Maybe you could check if the URL is blocked using the CDX API before every capture but it's not necessarily something everyone would want to enable. If the captures are actually being stored, maybe some people would consider that to be a successful archival of the content despite the captures not being visible.
the website reports:
it appears that currently the script reports excluded websites as successful submissions, for instance:
outputs (timestamps and the data folder stripped):
the internal archive.org 'block list' reports errors (contains at least the said
archive.org
domain):