The classes does not escape the URLs it gets from the HTTP headers / the HTML meta tags. This makes the resulting CDX files invalid if the redirect URL contains spaces (see e.g. https://github.com/internetarchive/ia-hadoop-tools/issues/4). This commit fixes that by passing the resolved URL through java.net.URI's multi-argument constructor which escapes the individual parts appropriately.
The classes does not escape the URLs it gets from the HTTP headers / the HTML meta tags. This makes the resulting CDX files invalid if the redirect URL contains spaces (see e.g. https://github.com/internetarchive/ia-hadoop-tools/issues/4). This commit fixes that by passing the resolved URL through
java.net.URI
's multi-argument constructor which escapes the individual parts appropriately.