I think we should provide two different options with similar semantics - let's call them whitelist (replacing what you suggested) and blacklist (replacing remove_nodes and possibly strip_tags) for now. Both would take an array of tag and attribute selectors like this:
All links (<a>) containing an href property (thus excluding things like anchors)
Any tag containing a src attribute
<meta charset="utf-8"> (not convinced we need it to be this complex)
Anything matching whitelist would be kept as HTML during the conversion. Anything matching blacklist would be removed from the DOM.
I think this approach would give much better control as it would also allow us to match/keep/discard certain attributes only on certain elements, but of course it would also be more complex to implement. This may need to wait for a v5 rewrite of the library.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
See #100, #163, and #20.
I think we should provide two different options with similar semantics - let's call them
whitelist
(replacing what you suggested) andblacklist
(replacingremove_nodes
and possiblystrip_tags
) for now. Both would take an array of tag and attribute selectors like this:This would match:
<address>
elements<a>
) containing anhref
property (thus excluding things like anchors)src
attribute<meta charset="utf-8">
(not convinced we need it to be this complex)Anything matching
whitelist
would be kept as HTML during the conversion. Anything matchingblacklist
would be removed from the DOM.I think this approach would give much better control as it would also allow us to match/keep/discard certain attributes only on certain elements, but of course it would also be more complex to implement. This may need to wait for a v5 rewrite of the library.