mgdm / htmlq

Like jq, but for HTML.
MIT License
7k stars 107 forks source link

Feature request: Option to strip/reduce all whitespace, not just in text. #49

Open pbsds opened 2 years ago

pbsds commented 2 years ago

It would be nice to be able to collapse each match into a single line, for further filtering with tools like grep. For example, when matching table rows, each row often span multiple lines, due to how the html was formatted.

My current workaround is to minify the html before passing it to htmlq (cat myfile.html | sd '\n' ' ' | tr -s ' ' | htmlq ...), but a simple switch in htmlq would make this way easier.

Not sure how this would be handled in tags like pre tough...

kllmanu commented 11 months ago

@pbsds I guess you mean sed not sd?

pbsds commented 11 months ago

Sorry, i'm so used to sd i didn't notice.

cat myfile.html | sed -ze 's/\n/ /g' | tr -s ' ' | htmlq ...
kllmanu commented 11 months ago

@pbsds Thanks, didn't know about this one, will add it to my toolbelt!

I just ended up using xargs for the whitespace, which seems to beasier for me:

cat myfile.html | htmlq ... | xargs | htmlq ...