mirrorweb / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1 stars 2 forks source link

Webarchive won't replay unless Cloudfront is bypassed #51

Open cls-foy opened 2 years ago

cls-foy commented 2 years ago

cc @themasonbanks

Example URLs https://webarchive.nationalarchives.gov.uk/ukgwa/20031020010435/http://www.nationalarchives.gov.uk/news/stories/9.htm https://tnaqa.mirrorweb.com/ukgwa/20211004151828/https://www.counterterrorism.police.uk/latest-news/page/20/

When trying to navigate to these pages we are served with an error like this - "The web page at https://tnaqa.mirrorweb.com/ukgwa/20211004151810mp_/https://www.counterterrorism.police.uk/latest-news/page/20/ might be temporarily down or it may have moved permanently to a new web address."

When viewing this page we are served with a PYWB error (there are multiple examples of this error throughout) https://tnaqa.mirrorweb.com/ukgwa/20211004151800/https://www.police.uk/

When bypassing Cloudfront these pages load fine.

mijho commented 2 years ago

@cls-foy tna-qa isn't behind Cloudfront. What are you doing to circumvent it?