Open ChrisDoyleMW opened 10 months ago
Everything after a # in a URL is the fragment part, and it is never sent to the server, but is handled by the web browser. (Normally to scroll to a certain position on the page.) Hence a harvester can only harvest with a URL with the fragment part stripped. That is why Pywb strips it, and shows what it found in the index about the URL without fragment part.
But maybe Pywb could replace the fragment in the links, to trick the browser to scroll according to it. Or maybe that would be confusing.
Describe the bug
PYWB seems to be stripping out part of the URL when a timeline page is requested. For example: https://webarchive.nationalarchives.gov.uk/*/https://www.arcgis.com/apps/op sdashboard/index.html#/f94c3c90da5b4e9f9a0b19484dd4bb14 loads a timeline for https://www.arcgis.com/apps/opsdashboard/index.html Each instance shown is for index.html and not index.html#/f94c3c90da5b4e9f9a0b19484dd4bb14
Steps to reproduce the bug
Expected behavior
I'd expect the timeline page to show the correct URL timeline and allow visitors to view the history of capture for this specific URL - and not strip out the final part of the url.
Screenshots
Environment
• OS: Linux • Browser Any • Version PYWB 2.7