quantizor / markdown-to-jsx

🏭 The most lightweight, customizable React markdown component.
https://markdown-to-jsx.quantizor.dev/
MIT License
2.01k stars 174 forks source link

Disable parsing of raw HTML #225

Open will-hart opened 6 years ago

will-hart commented 6 years ago

Is there a way to disable parsing of raw HTML altogether? I know I can override specific tags but I'd like to automatically escape HTML characters without transforming the data stored in my database.

quantizor commented 6 years ago

Not currently, but it's something that could probably be added.

fastfedora commented 5 years ago

It would be nice to combine escaping HTML elements with a whitelist of HTML elements that are allowed to be parsed; anything else would be escaped. This would provide additional safety when displaying user-generated input.

fastfedora commented 5 years ago

Another feature might be a sanitization function that can be run before the attributes are passed to the parsed element. While you can always provide custom components for everything, one function could handle both things like URL sanitization for non-javascript: URLs and content filtering (curse words, etc).

Something along the lines of sanitize(node, rule, element) that would return an updated node. So an option might be:

sanitize: (node, rule, element) => (element == 'a' ? { ...node, target: customSanitizeUrl(node.target) } : node)

Or:

sanitize: (node, rule, element) => ({ ...node, content: bleepCurseWords(node.content) })

Then in the code, where it

   footnoteReference: {
      match: inlineRegex(FOOTNOTE_REFERENCE_R),
      order: PARSE_PRIORITY_HIGH,
      parse(capture /*, parse*/) {
        return {
          content: capture[1],
          target: `#${capture[1]}`,
        };
      },
      react(node, output, state) {
        const sanitizedNode = sanitize(node, 'footnoteReference', 'a');

        return (
          <a key={state.key} href={sanitizeUrl(sanitizedNode.target)}>
            <sup key={state.key}>{sanitizedNode.content}</sup>
          </a>
        );
      },
    },

This isn't the right issue for this, but it's a broader issue of how to handle user-generated content. I like this library, but I'm switching to react-markdown because it has better support for displaying user-generated content. markdown-to-jsx looks like a great library for internal content. To make it safely support user-generated content, I think you need:

Hopefully this comment has been helpful in that.

rescribet commented 5 years ago

Possibly relevant package https://github.com/cure53/DOMPurify

quantizor commented 5 years ago

That lib is bigger than markdown-to-jsx itself unfortunately.

Adding some basic config to just disable the HTML parsing rules should be relatively straightforward and it would just end up in the generated markdown as plain text.

rahulgi commented 4 years ago

Should this issue should be closed now after #278?

stephan-noel commented 3 years ago

First, please excuse my lack of security knowledge 🙂 . I have a problem that optionally disabling parsing raw HTML right now will also disable my custom components.

const options = {overrides: {MyCustomComponent: MyCustomComponent}};

<MyCustomComponent/> // This no longer works if I disable parsing raw HTML.

But what if I want to disable parsing raw HTML only (ie, like Githubissues.

  • Githubissues is a development platform for aggregating issues.