Encouraging safe practices around content/innerHTML

jlai commented 3 months ago

Subject of the issue

Hi, I recently started using the library and it's really nice.

However, I noticed most of the examples use a content string to put HTML in the grid nodes. This is OK for static content, but from a security perspective it opens up some potential for accidental XSS.

Modern frameworks are pretty good at preventing inexperienced developers from accidentally allowing XSS bugs by not allowing or by strongly discouraging use of innerHTML, for example React's dangerouslySetInnerHTML, but this security measure is bypassed by using content with this library.

Many of these issues can be avoided by passing el when creating the component, although there's still a few loopholes and it doesn't solve the problem of inexperienced developers using content inappropriately.

Scenario 1: String interpolation

A novice developer might try to do this:

grid.addWidget(`<b>${name}</b>`);

which would very easily open up an XSS attack if name came from somewhere untrusted. Example: https://jsfiddle.net/7094edhx/

Scenario 2: Load/save

The docs encourage using load()/save() to serialize a grid into an object, which can be serialized into a JSON string. The default save option serializes the HTML into content.

This is fine if the user is storing the JSON in localStorage, but opens up a lot of avenues for stored XSS attacks when stored elsewhere. For example, if someone built a dashboard editor that saved a grid to a server to share with other users without sanitizing it on save or load, this could allow users to inject Javascript that would be displayed to other users. Even the serialization demo shows an example of saving onclick handlers intact (although only locally).

A developer might wisely decide that saving user-created HTML is not a good idea, and do something like this:

async saveGrid() {
  const serialized = grid.save(false); // don't save HTML
  await upload(JSON.stringify(serialized));
}

async loadGrid() {
  const layout= JSON.parse(await getLayoutFromServer());
  grid.load(JSON.parse(serialized));

  document.querySelectorAll('.grid-stack-item').forEach((el) => {
    const id = el.getAttribute("gs-id");
    // do stuff to populate el based on id
  });
}

but get caught by the fact that load could inject HTML if they don't sanitize or validate the layout object. A malicious user could manually (e.g. by making direct calls to the server using curl instead of using the web page) upload a layout that includes content.

This can also affect widgets created with addWidget(el, opts) if update(el, opts) is called on it later with an untrusted opts object, although this is less likely than the load() scenario.

Solutions

Document the issue

The simplest change would be to add a section to the documentation educating users on potential XSS issues and how to avoid them. This puts the onus on developers to read it and stick to best practices, but doesn't protect against accidents.

Security works best with defense-in-depth, which is why I think we should do more than just educate the user.

Sanitize content

Most of the examples use a simple text string for the content, or simple markup that doesn't have any interactivity. Potentially, content could be sanitized by default to allow for these simple use cases.

The downside is that you'd need a library like sanitize-html or DOMPurify to do it correctly, which would add an external dependency (this library currently has none).

This feels like it's beyond the scope of a grid layout library, so I don't actually recommend this as an option.

A simpler option would be to only allow text content, rendered using el.textContent = w.textContent, which would be mostly useful for examples/demos. This would a breaking change. A transitional step could be use textContent and htmlContent (or unsafeHtmlContent to make it clear that it's potentially unsafe).

Disable HTML content by default

Only allow htmlContent and passing HTML strings to addWidget if GridStack.init is passed a dangerouslyAllowHtmlContent option to enable it. This would help with migration and protect code that mostly uses addWidget(el) from accidental content during update or load.

Data property / rendering function

To provide users an alternative to content, maybe allow setting a rendering function, and add a data object to the widget properties. It might work something like this:

interface MyGridData {
  type: 'chart' | 'note';
  chartConfig: ChartConfig;
  note: string;
}

const grid = GridStack<MyGridData>.init();
grid.setRenderFunc((nodeEl, node) => {
  const data = node.data;

  switch (data.type) {
    case "chart":
      const chartEl = document.createElement('div');
      setupChart(el, data.chartConfig);
      nodeEl.appendchild(el);
      break;
    case "note":
     parentEl.textContent = data.noteText;
     break;
  }
});

grid.load([
  {
    w: 3,
    h: 3,
    data: {type: 'chart', chartConfig: {series: 'agriculture'}}
  }
]);

grid.addWidget({
  w: 3,
  h: 3,
  data: {type: 'note', noteText: "Hello world"}}
});

This would basically be a more user-friendly version of the addRemoveCB and would work with addWidget instead of just load.

addRemoveCB is confusing because it lets you manage el creation but only notifies you of removal and does el.remove() itself. Instead, it might be better as three functions that the developer can pass to customize: createDOMFunc, renderFunc, removeDOMFunc.

Create would be used if the user needs to customize inserting the DOM node. Render if the user just wants to add children to the item node and use the default create func. Remove for people who need to customize how it's removed from the DOM (separate from the 'remove' event).

Data would also be useful for serializing additional state for each node instead of having to store it separately and look up by id.

If this sounds like a good approach, I can break it out into a separate Github issue.

Remove HTML strings from the framework

Ultimately, I think the goal should be to remove .innerHTML from the library entirely. If the user needs to populate content this way, they can manually set el.innerHTML, ideally after running it through a sanitizer or their framework's safe render call.

There's a couple cases where .innerHTML is used to create a div with classes which could be replaced with document.createElement followed by setting the classList. Although it's highly unlikely that gridOptions.itemClass could be passed an untrusted string, it's better to just eliminate the risk, which also makes the library more palatable to security teams where innerHTML might show up as a red flag on a scan.

Your environment

10.2.1

adumesny commented 3 months ago

thank you very much for the throughout analysis. I agree I don't want external dependencies (I worked very hard to remove them all and also shrink the lib to 1/3rd as well).

in real app (like the Angular wrapper I use) .content is never used as the components are created by selector/id anyway. for demo purpose it is nice to have, so maybe we should have el.textContent = w.content support only and remove el.innerHTML all together since that will audit a red flag. breaking change so v11+

addRemoveCB is confusing because it lets you manage el creation but only notifies you of removal and does el.remove() itself. Instead, it might be better as three functions that the developer can pass to customize: createDOMFunc, renderFunc, removeDOMFunc.

interesting, but GS does parent.appendChild(el) from what you return and el.remove (and let you know), so it's symetrical and made more sense to me at the time. In your case above, would createDOMFunc also do the appendChild then ? Also 'render' is very React named :) you are still adding DOM children...

This would basically be a more user-friendly version of the addRemoveCB and would work with addWidget instead of just load.

addWidget does call addRemoveCB, but you are right a renderCB (or better contentCB that matches the field) would make it slightly easier not having to create the 2 divs + classes above, so that makes sense.

as for having a data field, that would have to be any since it's app specific and I would rather not. instead app can subclass GS types (like Angular wrapper .selector and .input which is the same) as load/addWidget will use and store whatever structure you pass anyway. I even have a saveCB for saving extra data for the app already so store .data or whever field you want...

As for security the Angular wrapper .input field might be even worse since that is blinding applied to the widgets (assumes it matches @Input fields or vars - I probably should check you are not overriding methods or other stuff).

Finally: if you can tackle changing el.innerHTML over that would be great contribution!

jlai commented 3 months ago

Thanks for taking a look! Just FYI, this wasn't prompted by any formal security analysis; I'm just using the library for a personal project, but I do have some experience with security guidelines from past jobs.

Yeah, I can take a look at switching it to textContent. Do we want to just treat content as text or would it make sense to rename it to textContent to reflect the breaking change?

The data idea was to make it easier to get app-specific information about the node from callback functions (addRemoveCB, event listeners). Mainly for load() where you're constructing nodes using GridStackWidget, rather than addWidget() where you can still easily iterate over your custom node list.

Otherwise if you wanted to decide what to render for a node, like a component or template, you end up having to keep a separate data structure and look up by id in the callback, but id is optional so you might have to generate one. Or you could add custom fields to the GridStackWidget but then you have to cast it to your type and then strip it from save().

Data can be typed by making GridStack, GridStackWidget, and other types generic, e.g. GridStackWidget<MyCustomData>. Probably defaulting to GridStack<TUserData = void> for users that don't need it.

Having a data object under GridStackWidget explicitly for app data is also probably a good practice for saveCB to encourage adding properties to a user-defined data object, instead of directly on the GridStackWidget where it could conflict with future GridStack properties. As is, I think you'd have to cast the widget to any or some extension type to add your properties when using saveCB?

This is sort of separate from innerHTML removal but I'm thinking about how we'd update the demos that use HTML content with load() as well as any users that need a migration path.

adumesny commented 2 months ago

Do we want to just treat content as text

yes, let's re-use the field for now. customer can use full html if they don't care (but would need to render themself) or send though safe dom code, so not only text.

Or you could add custom fields to the GridStackWidget but then you have to cast it to your type and then strip it from save().

no different from having a data: any pointer. Still gonna have to cast it. My point is I don't want to force it generic data field. Angular wrapper has 2 custom fields and no need to strip for save as that info needs to be saved as well.

Having a data object under GridStackWidget explicitly for app data is also probably a good practice for saveCB to encourage adding properties to a user-defined data object, instead of directly on the GridStackWidget

Only thing is you are reserving data as a field so GS won't conflict with your app field in the furtur. OK I buy that, and will think more about it. But no casting to any needed in my cases (see Angular wrapper). in your CB just have your subclass MyGridStackWidget as param (as long as fields are obtional you can freely go back and forth) with your extra field. that's exactly what I already do in the Angular wrapper by the way. easy peasy.

adumesny commented 1 month ago

@jlai did you ever start converting .innerHTML = away ? I was thinking of picking that up if I have spare time (just got a broken injury).

My plan was to

remove all .innerHTML = into document.createElement + class when parents are created
where we use w.content have that call a renderCB that defaults to just el.textContent = w.content and fix the few demos that need more than text (one or two)
update document explain the safeguard and release at v11 (breaking change)

jlai commented 1 month ago

Hi @adumesny

Sorry to hear about your injury! Hope you have a speedy recovery.

Removing the innerHTML is pretty straightforward, but I got stuck on how to convert the demo code as well as the tests (even without any changes, the test runner seems to hang)

Here's how I approached it, using <template> elements for creating HTML, although I didn't get around to the more complicated demos: https://github.com/gridstack/gridstack.js/compare/master...jlai:gridstack.js:content-as-textContent

adumesny commented 1 month ago

ok, yeah I've done it with a saveCB that defaults to textContent, should have a review soon...

adumesny commented 1 month ago

got it working. only a few demos will need change... will need bunch of testing (addWidget() API changed as well)

adumesny commented 2 weeks ago

fixed in upcoming v11

jlai commented 2 weeks ago

Great improvement for security, nice having the renderCB as well. Thank you!

gridstack / gridstack.js