Closed fireindark707 closed 1 year ago
Sometimes it takes 12 to 24 hours for the URL to appear in the Wayback Machine and in the Wayback CDX API. This is normal and expected behavior, but is not documented. The SPN2 API also has a flag, delay_wb_availability=1
, that disables the immediate indexing (and would allow you to intentionally replicate the delay that you've been experiencing) and can be enabled in the script using option -d 'delay_wb_availability=1'
.
The capture becomes available in the Wayback Machine after ~12 hours instead of immediately. This option helps reduce the load on our systems. All API responses remain exactly the same when using this option.
Thank you for your reply, I would like to ask where the SPN2 API document comes from? I didn't find it on the IA website.
It isn't linked on the IA website. It was first mentioned on Twitter in 2019 by a staff member.
Many thanks for this script, I think it contributes a lot to the web archive. I am using it now and it works very well most of the time. But I have recently discovered some possible issues. Some URLs exist in the success.log, but when I try to access them directly using the Internet Archive site, the site shows that they are not indexed. I'm not sure why that is. But at the same time, there are many links that are successfully included. I'll keep testing to determine what's going on (IA site latency?)