mathiasbynens / he

A robust HTML entity encoder/decoder written in JavaScript.
https://mths.be/he
MIT License
3.43k stars 255 forks source link

Implement `allowUnsafeSymbols` option (which defaults to `false`) for `he.encode` #23

Closed jugglinmike closed 10 years ago

jugglinmike commented 10 years ago

Hey @mathiasbynens!

I'd love to use this library in my polyfill for the iframe srcdoc attribute, but I think I need an additional feature. The functionality I'm proposing here would help me to resolve an issue that was recently filed against that project.

Commit message:

Although he.escape allows for escaping "unsafe" markup characters, there is currently no way to escape the inverse set--non-ASCII characters only. This is useful in contexts where markup is allowed, but non-ASCII characters might be otherwise mangled (as in the iframe srcdoc attribute, for example).

Introduce a new option to he.encode which disables the escaping of unsafe markup characters. Preserve API compatability by disabling this behavior by default.

mathiasbynens commented 10 years ago

Your patch looks good so far. Some remarks:

  1. I’d like to rename markupChars to something that more closely matches the other option names, though – escapeUnsafeSymbols for example. What do you think? Any suggestions?
  2. More tests are needed that show how this new option combines with the other existing options (such as escapeEverything).
  3. The CLI needs to support this new option too.

By the way, this would solve #16.

Note that this new option should not be used in cases where the markup can contain custom elements, since those can contain all kinds of symbols and escaping e.g. <λ-calculus> as <&#x3BB;-calculus> would break stuff. The same goes for non-HTML contexts, such as the contents of a <script> or <style> element (you don’t want to HTML-escape anything in there). Perhaps this should be added to the docs as a warning?

jugglinmike commented 10 years ago

@mathiasbynens I've renamed the option, but I have a few questions about your other requests:

  1. This is an option for the encode method and the library already supports an option named encodeEverything, so what about naming this option encodeUnsafeSymbols instead of escapeUnsafeSymbols?
  2. How should the library behave when both encodeEverything and this new option are set to true encodeEverything is set to true and this new option is set to false? There is a little ambiguity here because both options control the same behavior, just on different character sets. I could see it encoding everything except unsafe characters, but I could also see "everything" taking precedence, resulting in everything being encoded (in other words, ignoring escapeUnsafeSymbols). Because of the ambiguity, I think either decision could be interpreted as arbitrary, so I'm inclined to throw an error if both are set. Do you think that's the right approach?
  3. It doesn't look like the project explicitly tests the CLI--could you confirm?
mathiasbynens commented 10 years ago

@jugglinmike

  1. Good call! +1 to encodeUnsafeSymbols rather than escape….
  2. IMHO encodeEverything should take precedence, we should update the documentation to reflect this, and not throw an error (i.e. silently do what the documentation says it does). No one is going to use these options without reading the documentation anyway. Perhaps the problem could be avoided by renaming encodeEverything into something like encodeSafeAsciiSymbols or encodeAsciiSymbols?
  3. At the moment the CLI does not have tests, indeed.

Thanks for your work on this so far!

jugglinmike commented 10 years ago

@mathiasbynens Thanks for the clarification. I've held off on renaming the existing method because I'm not sure it resolves the ambiguity, and the suggestion seemed more like an afterthought. Of course, I'm happy to change it if you would like me to!

The encodeUnsafeSymbols option is a little unique in that it defaults to true, and reflecting this in the CLI presented a little bit of a problem. I decided that consistency between the flags within the CLI (via --allow-unsafe) is more important than consistency across the CLI and JavaScript API (via, for instance, --unsafe=false). I'm not sure if that matches how you would design it, so I wanted to call it to your attention.

By the way, I expect you'll want to squash all this down when we get to a final draft. I'd be happy to do that too, just let me know.

mathiasbynens commented 10 years ago

@jugglinmike Your arguments convinced me that allowUnsafeSymbols would actually be an even better name (because then the default value is false, like all other options). :+1: (I don’t like the double negative in allowUnsafeSymbols: false but hey.)

jugglinmike commented 10 years ago

Thanks for the feedback, @mathiasbynens — I think I've addressed it all.

mathiasbynens commented 10 years ago

This looks great! Could you squash your commits please?

If you don’t want to fix those last few nitpicks I just left as comments, no worries, I’m happy to take care of them after merging your patch. Just let me know.

jugglinmike commented 10 years ago

No problem at all, Mathias. I am a bit busy this week, so I don't expect to address these final issues until Saturday. If you'd like to see this merged before then, though, be my guest :)

coveralls commented 10 years ago

Coverage Status

Coverage increased (+0.07%) when pulling 37f0e29c95ef1508f6a1a258f536f71df8152d24 on jugglinmike:markupchars-option into 78bc2bbc95a82d03bfe02247a110db32cfe5854b on mathiasbynens:master.

jugglinmike commented 10 years ago

@mathiasbynens I've fixed the typo, squashed the commits, and updated the commit message. I held off on the README.md change you requested because I'm not sure how to best address it. It will probably save us a few more rounds of review if I leave that to you :)

mathiasbynens commented 10 years ago

Thanks, I’ve tweaked your patch and merged it. Will release a new version now.

$ npm install he@0.5.0 --save-dev
jugglinmike commented 10 years ago

Awesome--thanks!