Open thomasegense opened 2 years ago
This is an example of an url_norm in Solr with the port 80. url_norm:"http://train-aarhus.dk:80/visbillede.asp?fp=brandnewheavies.jpg"
In this case the url comes from the ARC (not WARC) header:
http://train-aarhus.dk:80/visbillede.asp?fp=brandnewheavies.jpg 194.239.250.54 20001021042018 text/html 1699
HTTP/1.1 200 OK
Server: Microsoft-IIS/4.0
Also when parsing links (a href) on a page port 80 should also be removed. Having links with and without port 80 will result in playback issues since url can not be matched.
Same goes for https port 443
This is an example of an url_norm in Solr with the port 80. url_norm:"http://train-aarhus.dk:80/visbillede.asp?fp=brandnewheavies.jpg"
In this case the url comes from the ARC (not WARC) header:
Arc Header
http://train-aarhus.dk:80/visbillede.asp?fp=brandnewheavies.jpg 194.239.250.54 20001021042018 text/html 1699
HTTP/1.1 200 OK
Server: Microsoft-IIS/4.0
Also when parsing links (a href) on a page port 80 should also be removed. Having links with and without port 80 will result in playback issues since url can not be matched.
Same goes for https port 443