simonw / strip-tags

CLI tool for stripping tags from HTML
Apache License 2.0
209 stars 6 forks source link

Exclude tags option: -r/--remove #24

Closed HaveF closed 1 year ago

HaveF commented 1 year ago

I like this tool, thank you, Simon.

In most of time, it seems that we don't know where is the the main part of a common page.

But we may know we don't want aside, we don't want 'header', 'footer', 'nav'...

So, I believe exclude tags options may useful?

simonw commented 1 year ago

This is interesting - yeah, it makes sense to me.

simonw commented 1 year ago

A -x/--exclude option could fit here. Here's what --help would look like with that:

Options:
  --version             Show the version and exit.
  -i, --input FILENAME
  -m, --minify          Minify whitespace
  -x, --exclude TEXT    Exclude content in these selectors
  -t, --keep-tag TEXT   Keep these <tags>
  --all-attrs           Include all attributes on kept tags
  --first               First element matching the selectors
  --help                Show this message and exit.
simonw commented 1 year ago

Demo:

curl -s https://datasette.io/ | strip-tags -r nav -r footer -m