j0k3r / graby

Graby helps you extract article content from web pages
MIT License
363 stars 73 forks source link

Support tag change matching a XPath query (`retag(tag)`) #292

Open Kdecherf opened 2 years ago

Kdecherf commented 2 years ago

It would be helpful to have the support of a retag(tag): //xpath parameter in site-config files.

The main goal of this command would be to change the tag matching a XPath query.

Example, taking the following html and site-config:

<html>
  <body>
    <div class="heading-h3">Hello world</div>
  </body>
</html>
retag(h3): //div[@class="heading-h3"]

would give the following output:

<html>
  <body>
    <h3>Hello world</h3>
  </body>
</html>
Kdecherf commented 2 years ago

@j0k3r could you assign this issue to me please?

jtojnar commented 2 years ago

Is that an official syntax? Do not see it mentioned in the docs.

And would not it make more sense to preserve the attributes when just changing the tag name?

Kdecherf commented 2 years ago

Is that an official syntax? Do not see it mentioned in the docs.

No, it would be a graby-only command

And would not it make more sense to preserve the attributes when just changing the tag name?

Yes, you're right