preactjs / preact-render-to-string

📄 Universal rendering for Preact: render JSX and Preact components to HTML.
http://npm.im/preact-render-to-string
MIT License
653 stars 92 forks source link

HTML Entity encoding in Script Tags #332

Open frenetic43 opened 9 months ago

frenetic43 commented 9 months ago

Background

I'm new to both 'preact' and 'preact-render-to-string' (and htm) and was surprised to find the contents of my script tag were automatically being entity encoded, which unfortunately breaks it.

Expected Output (Non-Encoded):

<script type="importmap">
  { "imports": {
    "lazyload": "./lazyload.js" }
  }
</script>

<script type="module">
  const iMap = { "imports": {
    "lazyload": "./lazyload.js" }
  };
</script>

Actual Output (Encoded):

<!-- Browser Error: "Could not parse Json" -->
<script type="importmap">
  { &quot;imports&quot;: {
    &quot;lazyload&quot;: &quot;./lazyload.js&quot; }
  }
</script>

<!-- Browser Error: "Unexpected token '&'" at '&'quot; -->
<script type="module">
  const iMap = { &quot;imports&quot;: {
    &quot;lazyload&quot;: &quot;./lazyload.js&quot; }
  };
</script>

Since any content within a script tag is parsed by the appropriate scripting-engine rather than the html-parser and thus doesn't need to be entity escaped, it might be convenient to exclude the 'script' tag from automagical encoding.

NOTE: I'm not sure if this affects non-SSR preact. I haven't gotten far enough to try client-side rendering yet.

Research

It looks like it would be trivial to implement in src/index.js: Around Line 470 add something like...

else if (typeof children === 'string') {
  if (type === 'script') { html = children; }
  else { html = encodeEntities(children); }
}

However, I also see around line 148 'Text VNodes: escape as HTML' and I don't know enough about preact to know how that would be triggered.

Fortunately, by looking at the source code I did find dangerouslySetInnerHTML which solves my immediate problem.

Discussion

I'm not sure what the best solution here would be.

In-my-own-opinion, modern web-apps have grown so massive that the use of inline-scripting (rather than loading an external script file) is a pretty fringe-case. I'm only using it myself because I'm still in the early-hacky-days of my current project.

Further, enforcing the use of dangerouslySetInnerHTML does provide a certain additional level of security against accidental... stuff. Though it could be better documented, I had to dig through the source-code to find it.

On the other hand, web technologies are rapidly expanding with new use-cases; 'import maps' being an easy example. And using dangerouslySetInnerHTML on an element that doesn't technically need to be entity escaped to begin with... well, it just isn't 'pretty'.

I think that's actually what bothers me the most, my code isn't as pretty anymore.

frenetic43 commented 9 months ago

UPDATE

I just found this also affects internal CSS style tags.

<style>
#settings {
  &amp; .content {
    quotes: &quot;&quot; &quot;&quot;;
    &amp; .icon {}
  }  
}
</style>
marvinhagemeister commented 9 months ago

Yes, this affects any text content in HTML, regardless of the element used. We basically have two choices:

For completely static strings like in your examples this would be perfectly fine. But from Preact's point of view we don't know how the string was created.

let username = getUserNameFromRequest();

<script>
  // Without escaping this line is a XSS vulnerability
  let username = {username};
  console.log(username)
</script>

Since Preact isn't tied to a compiler and only sees the final string, it cannot make any assumptions that it's secure to use. For this reason the current behavior exists. This behavior could be changed in meta-frameworks which typically use compilers to inspect the original source code. But in Preact itself this is something very risky to change in regards to security of users. It's very unlikely that we will change that in Preact.

frenetic43 commented 9 months ago

Yeah, I pretty much expected that was the case.

Honestly, the entire reason for opening this issue was, as a new user to preact, it didn't occur to me that this might even be a problem, and documentation is scarce enough that it took an entire day just to figure out why my code wasn't working.

Hopefully, just by making this post any others that run into the same issue can save themselves some time.

At the very least documentation on dangerouslySetInnerHTML should be improved. The only official documentation (for preact) I could find was here. It took a combination of looking at the source code (to even know it was an option), StackOverflow posts, and React's documentation to figure out the right syntax to make it work.

As for making the code pretty again, it's easy enough to write a wrapper function.


Also, since the reason for this post is to save some time for others, I should probably add an example of using dangerouslySetInnerHTML. I just forgot.

NOTE: this example is using both HTM and a did-it-myself CSS (strips comments, minifies, returns string) tagged template literals.

const STYLES = css`
  #settings {
    & .content {
      quotes: "" "";
      & .icon {}
    }  
  }
`;

// Must use an object as { __html: 'content to inject' }
const innerHTMLCSS = { __html: STYLES };

function injectCSS () {
  return html`
    <style dangerouslySetInnerHTML=${ innerHTMLCSS } ></style>
  `;
}

const IMPORT_MAP = {
  imports: {
    lazyload: "./lazyload.js"
  }
};

function injectImportMap () {
  return html`
    <script
     type="importmap"
     dangerouslySetInnerHTML=${ { __html: JSON.stringify(IMPORT_MAP) } }
    ></script>
  `;
}

export class Head extends Component {
  render (_props: unknown) {
    return html`
      <head>
        <meta charset="UTF-8" />

        ${injectImportMap()}
        <script src="./core/index.ts" type="module"></script>

        ${injectCSS()}
      </head>
    `;
  }
}
jespertheend commented 7 months ago

I'm also a new user running into this issue. What made this extra confusing for me is that this is not an issue when using render() in the preact repl. Although I think I might already know why this is. I'm assuming render() uses something like element.textContent to set the content of an element, whereas renderToString() might run on a server without access to a dom, and so it can't make use of element.textContent.

lenovouser commented 2 weeks ago

I am also running into this, using SSR. Maybe instead of unlocking it for special tags, preact could add something similar to the SafeString from handlebars and then just not escape these when it encounters them. I for example am running into this in a <Translate> helper that returns bare strings that sometimes contain formatting stuff like &copy; or <br> which all get escaped now.

I am currently not even finding a workaround, for example fragments do not accept dangerouslySetInnerHTML, which is the only way I could make these translation helpers work realistically, in theory it should work according to this comment: https://github.com/preactjs/preact-render-to-string/pull/251#pullrequestreview-1142760874, but it does not. There also was some discussion in https://github.com/preactjs/preact-render-to-string/issues/176 but I found no solution that works.