Opening the WARC file entry, it looks as though the page attempts to redirect to "http://www.slunglow.org/". Not sure if this means the live page the time had a redirect to slunglow.org, or if this was an issue at crawl time or when it was being processed? :
I'm not sure if this is a bug or expected behaviour. The thing that made me suspicious are the timestamps for the WARC directories that are the same, except leeds2023 is DC1(dc1-20150827), and slunglow.org is DC3(dc3-20150827)
The first capture of http://leeds2023.co.uk/ redirects to a completely different URL: http://www.slunglow.org/. all other captures for leeds2023 seem fine, except for this first capture.
I tried this through QA Wayback in the production and dev versions, and both have the same behaviour.
Accessing the first capture: https://www.webarchive.org.uk/act/wayback/archive/20150924131417/http://leeds2023.co.uk/
Will take the user to:
Looking at the CDX entry for http://leeds2023.co.uk/: https://www.webarchive.org.uk/act/wayback/archive/cdx?output=json&url=http%3A%2F%2Fleeds2023.co.uk%2F {"urlkey": "uk,co,leeds2023)/", "timestamp": "20150924131417", "url": "http://leeds2023.co.uk/", "mime": "text/html", "status": "302", "digest": "DL6U7LX4C2BWBS3ZO6BJYH74O2KBFCR3", "redirect": "-", "robotflags": "-", "length": "508", "offset": "790228315", "filename": "/heritrix/output/warcs/dc1-20150827/BL-20150924124957230-05165-22754/~crawler04/~8444.warc.gz", "load_url": "", "source": "archive", "source-coll": "archive"}
Looking into WARC file:
Opening the WARC file entry, it looks as though the page attempts to redirect to "http://www.slunglow.org/". Not sure if this means the live page the time had a redirect to slunglow.org, or if this was an issue at crawl time or when it was being processed? :
This is the incorrect (or maybe correct) capture we're being directed to: https://www.webarchive.org.uk/wayback/en/archive/20150828172150/http://www.slunglow.org/
Looking at the CDX entry for that capture: https://www.webarchive.org.uk/wayback/en/archive/cdx?url=http%3A%2F%2Fwww.slunglow.org%2F&output=json {"urlkey": "org,slunglow)/", "timestamp": "20150828172150", "url": "http://www.slunglow.org/", "mime": "text/html", "status": "200", "digest": "FTX2MLPS6WIW52Q3EQ6ASDCWZO6H4FLP", "redirect": "-", "robotflags": "-", "length": "10504", "offset": "212135411", "filename": "/heritrix/output/warcs/dc3-20150827/BL-20150828171634982-00220-8100/~crawler02/~8446.warc.gz", "load_url": "", "source": "archive", "source-coll": "archive", "access": "block"}
Looking at the WARC file:
WARC comparison http://leeds2023.co.uk/ September 24th, 2015 at 13:14:17
http://www.slunglow.org/ August 28, 2015 at 6:21:50 -dc3-20150827/BL-20150828171634982-00220-8100/~crawler02/~8446.warc.gz
Internet Archive doesn't have any copies for 2015 for leeds2023.co.uk: https://web.archive.org/web/20150101000000*/http://leeds2023.co.uk
website for leeds2023 was registered in 2015:
I'm not sure if this is a bug or expected behaviour. The thing that made me suspicious are the timestamps for the WARC directories that are the same, except leeds2023 is DC1(dc1-20150827), and slunglow.org is DC3(dc3-20150827)