Closed Chaz6 closed 2 years ago
Note that this also has to honour the \<base href=""> element if present.
It would be interesting o check if that is an option already for the HTML parser/manipulation library, if it is we could just pipe the option to the parser if not we would have to build around it.
I've added a PR from the base_urls
branch that does this, if you'd like to try it out.
htmlq --base https://example.org
will rewrite relative URLs according to that URL.
htmlq --detect-base
will try to find a base URL from the <base>
element in the document. If not found, don't rewrite.
If you specify both, it will default to the base in the document, and fall back to the one supplied for --base
if not found.
In the example
curl -s https://www.rust-lang.org/ | htmlq -a href a
the links are output as-is, for example,/policies
. In order to use this with other tools, it would be useful to make these links absolute. For example,curl -s https://www.rust-lang.org/ | htmlq -u https://www.rust-lang.org/ -a href a
would results inhttps://www.rust-lang.org/policies
(i.e. any relative href attributes are converted to absolute using the base url specified with-u
).