thephpleague / html-to-markdown

Convert HTML to Markdown with PHP
MIT License
1.77k stars 205 forks source link

Allow to inject Environment object #118

Closed filipgolonka closed 7 years ago

filipgolonka commented 7 years ago

I need to customize rules, because I'm using modified Markdown, with couple of custom rules. I decided to prepare a logic for inject Environment object to HtmlConverter, so I am now able to change converters. Also it is easier to refactor tests to make them unit, because now test cases runs all the logic from library. I tried not to break BC, but I would appreciate your review, if I didn't miss some important cases.

colinodell commented 7 years ago

I really like your proposed change to HtmlConverter. However, I would like to keep createDefaultEnvironment() as-is - basically a static constructor which pre-configures the Environment with the default converters needed by 99% of users.

Instead of modifying createDefaultEnvironment(), how would you feel about moving that extra parameter and logic into the constructor (or perhaps even a new static method create())? We'd basically add a new array $converters = array() param and loop through that to add additional converters.

filipgolonka commented 7 years ago

100% right, createDefaultEnvironment suggest, that it will create default one, but now this method breaks SRP so I will introduce new static method for it. Thanks for the review :)

filipgolonka commented 7 years ago

@colinodell done

filipgolonka commented 7 years ago

any progress on it? :)

colinodell commented 7 years ago

Sorry for the delay.

I think this change is moving in the right direction, but I don't like how this syntax looks:

$converter = new HtmlConverter(array(), $environment);

The $environment already has a configuration, so the first parameter is unnecessary.

What if we change the constructor of HtmlConverter to something like this:

    /**
     * Constructor
     *
     * @param Environment|array $config A pre-configured Environment, or configuration options to pass to a new default Environment
     */
    public function __construct($config = array())
    {
        if ($config instanceof Environment) {
            $this->environment = $config;
        } elseif (is_array($config)) {
            $defaults = array(
                'header_style'    => 'setext', // Set to 'atx' to output H1 and H2 headers as # Header1 and ## Header2
                'suppress_errors' => true, // Set to false to show warnings when loading malformed HTML
                'strip_tags'      => false, // Set to true to strip tags that don't have markdown equivalents. N.B. Strips tags, not their content. Useful to clean MS Word HTML output.
                'bold_style'      => '**', // Set to '__' if you prefer the underlined style
                'italic_style'    => '_', // Set to '*' if you prefer the asterisk style
                'remove_nodes'    => '', // space-separated list of dom nodes that should be removed. example: 'meta style script'
                'hard_break'      => false, // Set to true to turn <br> into `\n` instead of `  \n`
            );

            $this->environment = Environment::createDefaultEnvironment($defaults);

            $this->environment->getConfig()->merge($options);
        }   
    }

This would allow you to instantiate a new HtmlConverter any of these ways:

$converter = new HtmlConverter();
$converter = new HtmlConverter(array('foo' => 'bar'));
$environment = Environment::createDefaultEnvironment(array('foo' => 'bar'));
$converter = new HtmlConverter($environment);

It makes instantiating the HtmlConverter easier and eliminates the potential to introduce two different configs.

Thoughts?

colinodell commented 7 years ago

BTW my proposed change would require a minor version bump, which I'm okay with it.

filipgolonka commented 7 years ago

@colinodell done. It's still backward compatible, so no needs for bumping major version. Please review and let me know, if you accept it - then I will squash commits.

filipgolonka commented 7 years ago

@colinodell done & squashed.

colinodell commented 7 years ago

Perfect, thank you so much!

colinodell commented 7 years ago

@filipgolonka,

I just realized something after merging this - the new converters configuration option will only work if you pass it into the Environment constructor:

$environment = new Environment(array(
    'converters' => array(
        new HeaderConverter(),
    ),
));

$converter = new HtmlConverter($environment);

This will not work the same way:

$converter = new HtmlConveter(array(
    'converters' => array(
        new HeaderConverter(),
    ),
));

This behavior occurs because our updated constructor doesn't pass in the custom configuration until AFTER the Environment is constructed:

public function __construct($options = array())
    {
        if ($options instanceof Environment) {
            $this->environment = $options;
        } elseif (is_array($options)) {
            $defaults = array( /* ... */);

            // The Environment gets instantiated here with the default configs:
            $this->environment = Environment::createDefaultEnvironment($defaults);

            // But the user-provided options aren't available until here:
            $this->environment->getConfig()->merge($options);
        }
    }

IMO, if we provide this as a configuration option, it should work the same way regardless of which approach you use.

I'd like to resolve this before making a new release. Here are some potential options:

  1. Instead of a configuration option, add an optional second param to the Environment constructor: array $converters = []
  2. Instead of a configuration option, create a static method like Environment::withConverters(array $converters)
  3. Modify Environment::__construct() so that it behaves as expected
  4. Remove the configuration option altogether; use the existing $environment->addConverter() method instead.

Personally, I'm leaning towards option 4, but that would undo half of your PR (the environment would still be injectable though).

What are your thoughts on this?

filipgolonka commented 7 years ago

@colinodell in my project I switched repository source to use my fork and I configure library to utilize 4th approach (it was done before your comment), so seems like it is the right way. So I will prepare next PR with this change.