mgdm / htmlq

Like jq, but for HTML.
MIT License
7k stars 107 forks source link

How to specify charset? #24

Open dw9694 opened 2 years ago

dw9694 commented 2 years ago

Hi. How to specify charset?

$ cat /tmp/index.html | htmlq 'title'
<title>������</title>
$ cat /tmp/index.html | htmlq 'h1.maintitle'
<h1 class="maintitle">������</h1>
mgdm commented 2 years ago

There's no way to do that right now as it pretty much assumes UTF-8 at the moment, but I'll have a look into it.

Sematre commented 2 years ago

I would pipe it through iconv --from-code <your_charset>.

Example: If your file is encoded with windows-1252, you can fix it like that: cat /tmp/index.html | iconv --from-code windows-1252 | htmlq 'title'

Or even simpler: If you want to read from a file directly: iconv --from-code windows-1252 /tmp/index.html | htmlq 'title'