mgdm / htmlq

Like jq, but for HTML.
MIT License
7k stars 107 forks source link

Case sensitiveness: htmlq not preserving case? #50

Open ryenus opened 2 years ago

ryenus commented 2 years ago

Somehow htmlq turns the element, or tag names into lowercase:

Expected

$ echo -e '<Need>\n  <PreserveCase>True</PreserveCase>\n</Need>' | htmlq Need
<Need>
  <PreserveCase>True</PreserveCase>
</Need>

Actual

$ echo -e '<Need>\n  <PreserveCase>True</PreserveCase>\n</Need>' | htmlq Need
<need>
  <preservecase>True</preservecase>
</need>

Here the tag <Need> becomes <need> and <PreserveCase> becomes <preservecase>, which is not what expected. Possible to preserve the exact case in the tag names? Even behind an option?

Thanks!

muzimuzhi commented 1 year ago

HTML tag names are case-insensitive, it's XML that uses a case-sensitive pattern.

A quick search suggests the case conversion may come from html5serve, hence it's impossible to config it on htmlq's end. See https://github.com/servo/html5ever/search?q=lowercase.

ryenus commented 1 year ago

@muzimuzhi ahh, thank you, that's good to know. Meanwhile I've moved on with yq, which can preserve case properly:

echo -e '<Should><PreserveCase>True</PreserveCase></Should>' | yq -px -ox .Should

Which produces:

<PreserveCase>True</PreserveCase>
baodrate commented 1 year ago

Meanwhile I've moved on with yq, which can preserve case properly

Except yq, which assumes the input is standard XML rather than HTML, doesn't properly retain the order of text inside each tag (and it doesn't necessarily output valid html):

htmlq -p a <<<'<a>Order <b>should</b> be <em>preserved</em></a>'

produces

<a>Order <b>should</b> be <em>preserved</em></a>

but

yq -px -ox '.a' <<<'<a>Order <b>should</b> be <em>preserved</em></a>'

produces

<+content>Order</+content>
<+content>be</+content>
<b>should</b>
<em>preserved</em>