thephpleague / html-to-markdown

Convert HTML to Markdown with PHP
MIT License
1.77k stars 205 forks source link

Preserve (specific) HTML comments #177

Closed Rarst closed 4 years ago

Rarst commented 4 years ago

The converter currently discards all of HTML comments, but some of them might have special meaning and need to be preserved.

For example some systems use <!--more--> to separate excerpt from the rest of the content.

I've tried overriding CommentConverter and it mostly works, but I get erroneous slash prepended somewhere (e.g. resulting output is \<!--more-->)...

class CommentConverter extends \League\HTMLToMarkdown\Converter\CommentConverter
{
    public function convert(ElementInterface $element)
    {
        if ('more' === $element->getValue()) {
            return '<!--more-->';
        }

        return '';
    }
}
straube commented 4 years ago

@Rarst Does it make sense giving developers the choice to tell the library which comments to preserve? Perhaps through configuration. Something like:

$config = [
    'preserve_comments' => [ 'more', /* ... */ ],
],

Just suggesting that because I don't think hardcoding the values is a good idea. Every new comment with a special meaning would lead to a code change.

I know Wordpress uses the <!--more--> tag, in case this is a common use case, we could set the default the value of preserve_comments to [ 'more' ]. Users who don't want to preserve any comment at all could override the option passing an empty array.

Any thoughts?

Rarst commented 4 years ago

I think the meaningful configuration options might be something like:

  1. false preserve none
  2. true preserve all
  3. string[] preserve specific

Being able to preserve all matters because strict matches might be insufficient if comments contain something variable (yeah, that happens too).

colinodell commented 4 years ago

Implemented in #179. Thanks for the proposal!

colinodell commented 4 years ago

Released as 4.9.0.