thephpleague / html-to-markdown

Convert HTML to Markdown with PHP
MIT License
1.77k stars 205 forks source link

Preserve certain tags and attributes #164

Closed colinodell closed 5 years ago

colinodell commented 6 years ago

See #100, #163, and #20.

I think we should provide two different options with similar semantics - let's call them whitelist (replacing what you suggested) and blacklist (replacing remove_nodes and possibly strip_tags) for now. Both would take an array of tag and attribute selectors like this:

array('address', 'a[href]', '[src]', 'meta[charset=utf-8]')

This would match:

Anything matching whitelist would be kept as HTML during the conversion. Anything matching blacklist would be removed from the DOM.

I think this approach would give much better control as it would also allow us to match/keep/discard certain attributes only on certain elements, but of course it would also be more complex to implement. This may need to wait for a v5 rewrite of the library.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.