thephpleague / commonmark

Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.
https://commonmark.thephpleague.com
BSD 3-Clause "New" or "Revised" License
2.72k stars 191 forks source link

"safe" option #253

Closed mnapoli closed 8 years ago

mnapoli commented 8 years ago

I think I'm confused with the safe option: if set to true, does that mean:

I think I accidentally misinterpreted that and ended up opening XSS on my application. I'm afraid it might be wrongly interpreted by others. Would it make sense to rename that option (keeping BC of course) to something very explicit?

Cheers!

colinodell commented 8 years ago

It should be number 2: the resulting HTML output will be safe because any raw HTML in the input will be escaped.

This option defaults to false for backwards-compatibility reasons (and because the spec calls for raw HTML to be preserved). However, I do agree that this option is confusing, so perhaps we should "rename" it. What do you think of this suggestion?

  1. Add a new option allow_raw_html.
    • false - Ensures the HTML is not vulnerable to XSS because raw HTML will be escaped.
    • true - Raw HTML is kept as-is; XSS possible.
    • (Basically the opposite values of safe)
  2. Deprecate safe (raise an E_DEPRECATED noticed if this configuration option is set - minor BC break)
  3. If both allow_raw_html and safe are set, the former takes precedence.

I'm open to any alternate names or other feedback on this.

mnapoli commented 8 years ago

👍 sounds good, but would allow_raw_html = false be the default? I'd say this would be a good move, but it seems to conflict with the CommonMark spec.

I'm going a bit off topic but is there any spec on the behavior when allow_raw_html = false? I.e. I was surprised to see all HTML tags removed, I was expecting them to be escaped instead. I.e. Markdown:

Example: <strong>Foo</strong>

Expected:

<p>Example: &lt;strong&gt;Foo&lt;/strong&gt;</p>

Actual:

<p>Example: Foo</p>

In the documents I'm processing it's common for users to just write code in text (not use backticks), I'm guessing I might not be the only one?

Would it make sense to have 3 behaviors:

or is there a reason not to?

colinodell commented 8 years ago

I'd say this would be a good move, but it seems to conflict with the CommonMark spec.

I totally agree with you. IMO there are two questions that need to be answered:

  1. Does changing this default violate my BC promise? (see the second paragraph under Versioning)
  2. Does the security benefit outweigh the fact that this parser will become non-compliant with the spec by default?

I'd happily make the proposed change if I was 100% convinced the answer to both questions is "yes", but unfortunately I'm not.

Perhaps the README and documentation should make this configuration highly visible so users are aware of it?

An alternative solution might be raising some kind of E_* error if the configuration option is not set (but using the current default). For example:

if (!isset($options['safe']) && !isset($options['allow_raw_html'])) {
    trigger_error('Failing to set the "allow_raw_html" option could make your site vulnerable to XSS attacks. See http://commonmark.thephpleague.com/configuration#allow_raw_html for more information.', E_USER_NOTICE);
}

That link would describe the option and which value to choose.

While this technically is a BC break, it's not nearly as severe. It also causes the user to explicitly state what functionality they want. Thoughts?

Would it make sense to have 3 behaviors:

Yeah I really like that! What about this: a new configuration option html_input with three possible values:

(Naming things is hard...)

If you like the E_* suggestion, perhaps we could implement this variation:

if (isset($options['safe'])) {
    trigger_error('The "safe" option has been deprecated, use "html_input" instead. See http://commonmark.thephpleague.com/configuration#html_input for more information.', E_USER_DEPRECATED);
    $options['html_input'] = $options['safe'] ? 'strip' : 'preserve';
} elseif (!isset($options['html_input'])) {
    trigger_error('Failing to set the "html_input" option could make your site vulnerable to XSS attacks. See http://commonmark.thephpleague.com/configuration#html_input for more information.', E_USER_NOTICE);
    $options['html_input'] = 'escape';
}

Thoughts?

mnapoli commented 8 years ago

I'd happily make the proposed change if I was 100% convinced the answer to both questions is "yes", but unfortunately I'm not.

Agreed, documenting it more visibly might be a good first step. Errors can be a pain to deal with, I'm afraid it would be a burden to some users.

An html_input with strip, escape and allow would be awesome (and very clear). I might have some time this week to have a look, do you have some pointers (e.g. the classes I should look at)?

colinodell commented 8 years ago

I might have some time this week to have a look

That would be amazing!

do you have some pointers (e.g. the classes I should look at)?

Search the codebase for 'safe' - it only appears in a few places.

There are two renders which are responsible for converting HTML AST nodes into the final output - right now they simply return '' if safe mode is enabled, or the raw content if disabled. You'd probably want to put a switch here to return '', the raw contents, or escaped contents depending on the config value.

I haven't yet thought about the best location to implement the trigger_error code. Perhaps Environment::initializeExtensions() would work? This function is always run exactly once before the configuration options are used. There may be a better location but I'd need time to re-review everything to find it - suggestions are welcome, otherwise I can tackle this later.

BTW it looks like safe is referenced in a few other locations:

I apologize for not having the time to provide better, more-coherent information but perhaps later tonight or tomorrow I can do that.

mnapoli commented 8 years ago

As you said there's also the "safe links" behavior (only allow safe links), and I'm not sure an html_input option would make sense to control those. What do you think? Would it make sense to have a different option for that?

colinodell commented 8 years ago

Yeah that makes sense to me.

On Tue, Jun 28, 2016 at 3:06 PM Matthieu Napoli notifications@github.com wrote:

As you said there's also the "safe links" behavior (only allow safe links), and I'm not sure an html_input option would make sense to control those. What do you think? Would it make sense to have a different option for that?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/thephpleague/commonmark/issues/253#issuecomment-229150353, or mute the thread https://github.com/notifications/unsubscribe/AAMVMrtsRVb_RPDv8lFxeAESp2XXtXT0ks5qQXC4gaJpZM4I_hkA .