ukwa / ukwa-ui

A new user interface for the UK Web Archive
BSD 3-Clause "New" or "Revised" License
0 stars 6 forks source link

Browsing archived special collections between July 2015 and October 2018 will show you Welsh with no English option #370

Closed crarugal closed 1 year ago

crarugal commented 1 year ago

Browsing special collections (http://www.webarchive.org.uk/ukwa/browse/) between 2015 July and November 2018 will present content in Welsh: https://www.webarchive.org.uk/wayback/archive/*/https://www.webarchive.org.uk/ukwa/collection/

https://www.webarchive.org.uk/wayback/archive/20181020110223/https://www.webarchive.org.uk/ukwa/ is in English (other side links retain English version) image

but when you navigate to the "Browse the archive" side link, everything from then on is in Welsh, with no working option for English: https://www.webarchive.org.uk/wayback/archive/20181020110223mp_/https://www.webarchive.org.uk/ukwa/browse

image

The only way to view this content in English is to use Chrome's translate to English feature, or browse archived content before July 2015

anjackson commented 1 year ago

I don't think there's anything we can do about this unfortunately. That old site engine stored the language in a cookie and rendered the same URLs with different content depending on that cookie. So, if the crawler happened to enable the translated version, the crawl would continue in Welsh.

The new site has separate URLs for the translated versions (as recommended by Google), and this means the crawl can't end up, in this confused state.

But as the old site is long gone, we can recapture what was not crawled at the time.

crarugal commented 1 year ago

Thanks for looking into it Andy