Closed mnapoli closed 8 years ago
It should be number 2: the resulting HTML output will be safe because any raw HTML in the input will be escaped.
This option defaults to false
for backwards-compatibility reasons (and because the spec calls for raw HTML to be preserved). However, I do agree that this option is confusing, so perhaps we should "rename" it. What do you think of this suggestion?
allow_raw_html
.
false
- Ensures the HTML is not vulnerable to XSS because raw HTML will be escaped.true
- Raw HTML is kept as-is; XSS possible.safe
)safe
(raise an E_DEPRECATED
noticed if this configuration option is set - minor BC break)allow_raw_html
and safe
are set, the former takes precedence.I'm open to any alternate names or other feedback on this.
👍 sounds good, but would allow_raw_html = false
be the default? I'd say this would be a good move, but it seems to conflict with the CommonMark spec.
I'm going a bit off topic but is there any spec on the behavior when allow_raw_html = false
? I.e. I was surprised to see all HTML tags removed, I was expecting them to be escaped instead. I.e. Markdown:
Example: <strong>Foo</strong>
Expected:
<p>Example: <strong>Foo</strong></p>
Actual:
<p>Example: Foo</p>
In the documents I'm processing it's common for users to just write code in text (not use backticks), I'm guessing I might not be the only one?
Would it make sense to have 3 behaviors:
or is there a reason not to?
I'd say this would be a good move, but it seems to conflict with the CommonMark spec.
I totally agree with you. IMO there are two questions that need to be answered:
Versioning
)I'd happily make the proposed change if I was 100% convinced the answer to both questions is "yes", but unfortunately I'm not.
Perhaps the README and documentation should make this configuration highly visible so users are aware of it?
An alternative solution might be raising some kind of E_*
error if the configuration option is not set (but using the current default). For example:
if (!isset($options['safe']) && !isset($options['allow_raw_html'])) {
trigger_error('Failing to set the "allow_raw_html" option could make your site vulnerable to XSS attacks. See http://commonmark.thephpleague.com/configuration#allow_raw_html for more information.', E_USER_NOTICE);
}
That link would describe the option and which value to choose.
While this technically is a BC break, it's not nearly as severe. It also causes the user to explicitly state what functionality they want. Thoughts?
Would it make sense to have 3 behaviors:
Yeah I really like that! What about this: a new configuration option html_input
with three possible values:
strip
or remove
escape
preserve
, allow
or keep
(Naming things is hard...)
If you like the E_*
suggestion, perhaps we could implement this variation:
if (isset($options['safe'])) {
trigger_error('The "safe" option has been deprecated, use "html_input" instead. See http://commonmark.thephpleague.com/configuration#html_input for more information.', E_USER_DEPRECATED);
$options['html_input'] = $options['safe'] ? 'strip' : 'preserve';
} elseif (!isset($options['html_input'])) {
trigger_error('Failing to set the "html_input" option could make your site vulnerable to XSS attacks. See http://commonmark.thephpleague.com/configuration#html_input for more information.', E_USER_NOTICE);
$options['html_input'] = 'escape';
}
Thoughts?
I'd happily make the proposed change if I was 100% convinced the answer to both questions is "yes", but unfortunately I'm not.
Agreed, documenting it more visibly might be a good first step. Errors can be a pain to deal with, I'm afraid it would be a burden to some users.
An html_input
with strip
, escape
and allow
would be awesome (and very clear). I might have some time this week to have a look, do you have some pointers (e.g. the classes I should look at)?
I might have some time this week to have a look
That would be amazing!
do you have some pointers (e.g. the classes I should look at)?
Search the codebase for 'safe'
- it only appears in a few places.
There are two renders which are responsible for converting HTML AST nodes into the final output - right now they simply return ''
if safe mode is enabled, or the raw content if disabled. You'd probably want to put a switch
here to return ''
, the raw contents, or escaped contents depending on the config value.
I haven't yet thought about the best location to implement the trigger_error
code. Perhaps Environment::initializeExtensions()
would work? This function is always run exactly once before the configuration options are used. There may be a better location but I'd need time to re-review everything to find it - suggestions are welcome, otherwise I can tackle this later.
BTW it looks like safe
is referenced in a few other locations:
bin/commonmark
CLI command/appEnvironment
on line 481I apologize for not having the time to provide better, more-coherent information but perhaps later tonight or tomorrow I can do that.
As you said there's also the "safe links" behavior (only allow safe links), and I'm not sure an html_input
option would make sense to control those. What do you think? Would it make sense to have a different option for that?
Yeah that makes sense to me.
On Tue, Jun 28, 2016 at 3:06 PM Matthieu Napoli notifications@github.com wrote:
As you said there's also the "safe links" behavior (only allow safe links), and I'm not sure an html_input option would make sense to control those. What do you think? Would it make sense to have a different option for that?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/thephpleague/commonmark/issues/253#issuecomment-229150353, or mute the thread https://github.com/notifications/unsubscribe/AAMVMrtsRVb_RPDv8lFxeAESp2XXtXT0ks5qQXC4gaJpZM4I_hkA .
I think I'm confused with the
safe
option: if set totrue
, does that mean:I think I accidentally misinterpreted that and ended up opening XSS on my application. I'm afraid it might be wrongly interpreted by others. Would it make sense to rename that option (keeping BC of course) to something very explicit?
Cheers!