Discussion about this happened on IIPC Slack. For reference, I am putting some of the details in this issue to go along with the PR #397 opened by @peveikko.
On Ubuntu (not an issue with CentOS and RHEL) in at least some Tomcat versions , OpenWayback is returning Resource Not in Archive for https scheme archived URIs and suggests to search under http://https/www. Same pages do work with http scheme.
@peveikko noted: For https URLs Everything works fine at centos/rhel, but got this behaviour with 3 different ubuntu machines. Also tried with different tomcat/java versions.
Discussion about this happened on IIPC Slack. For reference, I am putting some of the details in this issue to go along with the PR #397 opened by @peveikko.
On Ubuntu (not an issue with CentOS and RHEL) in at least some Tomcat versions , OpenWayback is returning
Resource Not in Archive
forhttps
scheme archived URIs and suggests to search underhttp://https/www
. Same pages do work withhttp
scheme.@peveikko noted: For https URLs Everything works fine at centos/rhel, but got this behaviour with 3 different ubuntu machines. Also tried with different tomcat/java versions.
@anjackson supplied following: Okay, so I think this is to do with a CVE https://nvd.nist.gov/vuln/detail/CVE-2015-5174 -- I think Tomcat have added some URL clean-up/normalisation, meaning that later versions of Tomcat 6/7/8 may all have the same problem. This doesn't affect http URLs, perhaps because this code reinserts any stripped slash? https://github.com/iipc/openwayback/blob/c49f8e7200870c3af40561f3ca340c67c98db02f/wayback-core/src/main/java/org/archive/wayback/core/WaybackRequest.java#L755-L769 ...Easiest thing might be to modify the
WaybackRequest
to explicitly support/https:/host/...
(assuming I've got this right of course)