Preserve certain tags and attributes

See #100, #163, and #20.

I think we should provide two different options with similar semantics - let's call them whitelist (replacing what you suggested) and blacklist (replacing remove_nodes and possibly strip_tags) for now. Both would take an array of tag and attribute selectors like this:

array('address', 'a[href]', '[src]', 'meta[charset=utf-8]')

This would match:

All <address> elements
All links (<a>) containing an href property (thus excluding things like anchors)
Any tag containing a src attribute
<meta charset="utf-8"> (not convinced we need it to be this complex)

Anything matching whitelist would be kept as HTML during the conversion. Anything matching blacklist would be removed from the DOM.

I think this approach would give much better control as it would also allow us to match/keep/discard certain attributes only on certain elements, but of course it would also be more complex to implement. This may need to wait for a v5 rewrite of the library.

thephpleague / html-to-markdown

Preserve certain tags and attributes #164