The tool wget produces WARC-files with the values for the WARC-header WARC-Target-URI encapsulated in <>. Retrieving the URL from the WARC header in warc-indexer can be done safely using Normalisation.sanitiseWARCHeaderValue but this was not done for DroidDetectorAnalyser. This pull request fixes that.
This is an oversight that I am sure will come back to bite us, so I have raised issue #197.
The tool
wget
produces WARC-files with the values for the WARC-headerWARC-Target-URI
encapsulated in<>
. Retrieving the URL from the WARC header inwarc-indexer
can be done safely usingNormalisation.sanitiseWARCHeaderValue
but this was not done forDroidDetectorAnalyser
. This pull request fixes that.This is an oversight that I am sure will come back to bite us, so I have raised issue #197.