Open Arkiver2 opened 8 years ago
Hmm, key question is whether to rewrite any URL look-alike in HTML, and how generally useful it can be. Whether %-encoded or not is an minor issue here.
If HTML rewriter rewrote any URL look-alike in HTML, not just URL in attributes, it'd rewrite any textual mention of URL in HTML pages. I don't think that's the right thing to do in general. So rewriting %-encoded URL in this case is highly specific to this case. Unfortunately wayback does not have a mechanism of applying rewrite rules specific to particular URL at this moment.
With the next generation of the Wayback Machine, https://blog.archive.org/2015/10/21/grant-to-develop-the-next-generation-wayback-machine/, will the possibility be added to add special URL rewrite rules for certain URLs?
EDIT: Same question for ignoring/removing some custom query strings with special rules, for example timestamps, forums session IDs.
Currently percent encoded URLs are not rewritten. For example, the text from https://web.archive.org/web/20150804131701/http://blip.tv/file/get/NostalgiaCritic-NCPlanetOfTheApes401.m4v?showplayer=2014093037100220150422135039&referrer=http://blip.tv&mask=11&skin=flashvars&view=url should be rewritten like: Original:
_Should be rewritten as:_